UBC-MDS / data-analysis-review-2021

1 stars 4 forks source link

Submission: GROUP 28: Abalone Age Classifier #13

Open lynnwbl opened 2 years ago

lynnwbl commented 2 years ago

Submitting authors: @kphaterp @lynnwbl @nickmao1994 @veerupandey

Repository:https://github.com/UBC-MDS/abalone_age_classification Report link:https://ubc-mds.github.io/abalone_age_classification/README.html Abstract/executive summary: Abalones are endangered marine snails that are found in the cold coastal water around the world. The price of an abalone is positively associated with its age. However, determining how old an abalone is a very complex process. In this project we are classifying abalone snails into "young" and "old" according to their number of rings based on input features such as abalone's gender, height with meat in shell, weight of the shell etc. with a Logistic Regression model.

Editor: @flor14 Reviewer: Bulut_Berkay, ABDILAHI_KHALID, Liow_Mel, Rosenberg_Morgan

khalidcawl commented 2 years ago

Data analysis review checklist

Reviewer: @khalidcawl

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2

Review Comments:

Great work team! Overall, the project structure, code, and documentation look good. I have some minor suggestions listed below:

The things I liked about your project:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

mel-liow commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

2hrs

Review Comments:

I thought this was a very well organised and written analysis. Specifically, the introduction to the dataset and the research question was clear and provided the reader with good and specific context which helped to understand the report. The code was also easy to read and reproducible. I found having the output logs very useful when running the scripts.

I was able to follow all the instructions given in the Usage section of the README and reproduce the image results. The only issue I had was when I ran the Jupyter book report locally - I encountered this error:

Screenshot 2021-12-01 at 14 02 17

which may be because the build couldn't find the image files locally. However, I do appreciate that you did also published the jupyter book so that I could read the results from there too!

Nitpicking, and i'm aware this was probably a hangover from milestone 1 but the published proposal could be better formatted and presented more formally.

I also agree with Khalid's point above about making the research more concise - agree could do without listing the features.

Overall, great report.

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

berkaybulut commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

3hrs

Review Comments:

This is a very well organized repository with second level thinking with dockerisation.

Nice job everyone! I have provided some comments for your first milestone. Please address these concerns in your third milestone submission.

An additional improvement would be on your proposal. You can explain the reason for choosing f1 score instead of others cores such as AUC or recall. How f1 is better for this specific problem instead of other metrics.

I agree with Khalid's point above about making the research more concise - agree could do without listing the features.

Overall, I think its a great repository.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

morganrosenberg50 commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5

Review Comments:

Overall this is super interesting and I imagine useful for the niche! I experienced the same quandry as Khalid when running the script and agree some context around this would be helpful. Including some specific feedback below:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

katerinkus commented 2 years ago

Removed (wrong group) Link to the correct peer review

nickmao1994 commented 2 years ago

@katerinkus Hey, I saw your review but you might submit to a wrong place. Our group is working on abalone but your review is on wine quality.

katerinkus commented 2 years ago

@nickmao1994 Thank you for pointing this out! No idea how this happened.

nickmao1994 commented 2 years ago

Thanks for constructive feedbacks! Here are some of our improvements.

Four comments we agree with:

lynnwbl commented 2 years ago

Thank you for the comments! I really appreciate your feedback.

kphaterp commented 2 years ago

Thank you so much for the feedback everyone!

Regarding comment 1) in this review, I have addressed this concern about the the age of the data in this commit: UBC-MDS/abalone_age_classification@8fc5463

Regarding comment 5 in this review, I have addressed this concern about the hyperlinks in the contributing file in this commit: UBC-MDS/abalone_age_classification@f0dc10ed

Thank you for pointing these concerns out to us!

veerupandey commented 2 years ago

Thank you very much for your feedback!

Regarding comment 1) in this review, nohup is a widely used Unix command to run a job in the background and log the output in a file. We have made reference to the runner.log in the project README file.

Regarding comment 2) in this review, the data folder was added inside src to organize all the scripts dealing with raw data in a single directory. This convention has been followed by many real-world projects. The famous cookiecutter project also creates the data directory inside the src folder for data-related scripts.

Regarding the comment made in this review, images are generated dynamically with the script run and added to the project report. We realized that the image was not getting generated properly on the windows operating system. Thanks for reporting this error, we have updated the usage instructions for Windows OS with this commit: UBC-MDS/abalone_age_classification@83b747a.

Thanks again for your constructive feedback!