Submission: GROUP 28: Abalone Age Classifier

lynnwbl commented 2 years ago

Submitting authors: @kphaterp @lynnwbl @nickmao1994 @veerupandey

Repository:https://github.com/UBC-MDS/abalone_age_classification Report link:https://ubc-mds.github.io/abalone_age_classification/README.html Abstract/executive summary: Abalones are endangered marine snails that are found in the cold coastal water around the world. The price of an abalone is positively associated with its age. However, determining how old an abalone is a very complex process. In this project we are classifying abalone snails into "young" and "old" according to their number of rings based on input features such as abalone's gender, height with meat in shell, weight of the shell etc. with a Logistic Regression model.

Editor: @flor14 Reviewer: Bulut_Berkay, ABDILAHI_KHALID, Liow_Mel, Rosenberg_Morgan

[x] I agree to abide by MDS's Code of Conduct during the review process and in maintaining my package should it be accepted.

khalidcawl commented 2 years ago

Data analysis review checklist

Reviewer: @khalidcawl

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[ x] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[x] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[x] Importance: Do the authors clearly state the importance for this research question?
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 2

Review Comments:

Great work team! Overall, the project structure, code, and documentation look good. I have some minor suggestions listed below:

I ran the script and got some number on the console, not sure what it did. Could you perhaps describe what happens when I run the runner script, and where to find the report file as well?
"data" folder inside src - I can see that it contains data related scripts, but it could be renamed
Rephrase the research question. It's a bit verbose and difficult to read. You don't have to list out the features of your model in the research question.

The things I liked about your project:

Well organized project
One script to run everything instead of executing individual scripts
Readable code

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

mel-liow commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

[X] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[X] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[X ] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[X] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[X] Installation instructions: Is there a clearly stated list of dependencies?
[X] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[X] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[X] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[X] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[X] Style guidelides: Does the code adhere to well known language style guides?
[X] Modularity: Is the code suitably abstracted into scripts and functions?
[X] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robustness?

Reproducibility

[X] Data: Is the raw data archived somewhere? Is it accessible?
[X] Computational methods: Is all the source code required for the data analysis available?
[X] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[X] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[X] Authors: Does the report include a list of authors with their affiliations?
[X] What is the question: Do the authors clearly state the research question being asked?
[X] Importance: Do the authors clearly state the importance for this research question?
[X] Background: Do the authors provide sufficient background information so that readers can understand the report?
[X] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[X] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[X] Conclusions: Are the conclusions presented by the authors correct?
[X] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[X] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing:

2hrs

Review Comments:

I thought this was a very well organised and written analysis. Specifically, the introduction to the dataset and the research question was clear and provided the reader with good and specific context which helped to understand the report. The code was also easy to read and reproducible. I found having the output logs very useful when running the scripts.

I was able to follow all the instructions given in the Usage section of the README and reproduce the image results. The only issue I had was when I ran the Jupyter book report locally - I encountered this error:

which may be because the build couldn't find the image files locally. However, I do appreciate that you did also published the jupyter book so that I could read the results from there too!

Nitpicking, and i'm aware this was probably a hangover from milestone 1 but the published proposal could be better formatted and presented more formally.

I also agree with Khalid's point above about making the research more concise - agree could do without listing the features.

Overall, great report.

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

berkaybulut commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[x] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[x] Importance: Do the authors clearly state the importance for this research question?
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing:

3hrs

Review Comments:

This is a very well organized repository with second level thinking with dockerisation.

Nice job everyone! I have provided some comments for your first milestone. Please address these concerns in your third milestone submission.

An additional improvement would be on your proposal. You can explain the reason for choosing f1 score instead of others cores such as AUC or recall. How f1 is better for this specific problem instead of other metrics.

I agree with Khalid's point above about making the research more concise - agree could do without listing the features.

Overall, I think its a great repository.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

morganrosenberg50 commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[x] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[x] Importance: Do the authors clearly state the importance for this research question?
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5

Review Comments:

Overall this is super interesting and I imagine useful for the niche! I experienced the same quandry as Khalid when running the script and agree some context around this would be helpful. Including some specific feedback below:

Your data is quite old (<25 years). It would be valuable when you introduce to the project to explain why you are not concerned by this as to the relevance of your insights to modern day. I imagine our incremental knowledge of abalone and their evolution as a species isn't significant over this timeframe, but the data age does give pause.
Given this is an area which most readers may not have much if any background knowledge, I think it would be helpful to give context to some of your variables and analyses. For example. it's unclear what the "infant" class is within the sex feature (a sentence I never thought I would write). Given that it exists in the "is old" abalone classification, it makes me wonder if it is not a reference to age. While you address this at the very bottom of the report, it's another thing that may leave laymen readers curious (I'm curious!).
Overall the report is easy to read and well formatted. The only improvement I would suggest is the x axis in the last chart (figure 6) is hard to read and cut off.
The logo was a very nice touch! Consider color coding it to match the theme colors in your plots.
In your Contributing, the hyperlinks appear to be broken for me (obviously not a problem since you're explicit about what you're referencing, but worth polishing up).

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

katerinkus commented 2 years ago

Removed (wrong group) Link to the correct peer review

nickmao1994 commented 2 years ago

@katerinkus Hey, I saw your review but you might submit to a wrong place. Our group is working on abalone but your review is on wine quality.

katerinkus commented 2 years ago

@nickmao1994 Thank you for pointing this out! No idea how this happened.

nickmao1994 commented 2 years ago

Thanks for constructive feedbacks! Here are some of our improvements.

Four comments we agree with:

Regarding plot being cut issue from TA(https://github.com/UBC-MDS/abalone_age_classification/issues/81#issue-1073002332) and morganrosenberg50(https://github.com/UBC-MDS/data-analysis-review-2021/issues/13#issuecomment-986139483), we have fixed the plot by changing the plotting function at this commit: https://github.com/UBC-MDS/abalone_age_classification/commit/e1c3fcb99119532fe57dfe08a5e9d593497b2bc4
Regarding research question being too long from khalidcawl(https://github.com/UBC-MDS/data-analysis-review-2021/issues/13#issuecomment-984111641), mel-liow(https://github.com/UBC-MDS/data-analysis-review-2021/issues/13#issuecomment-984131478), and berkaybulut(https://github.com/UBC-MDS/data-analysis-review-2021/issues/13#issuecomment-986137568). we have rephrased our research question here along with other changes in wording: https://github.com/UBC-MDS/abalone_age_classification/commit/17a8fb5e8791db1a94404d8570e49b328fb9b17e
Regarding the explanation of choosing a certain metrics from TA(https://github.com/UBC-MDS/abalone_age_classification/issues/81#issue-1073002332) and berkaybulut(https://github.com/UBC-MDS/data-analysis-review-2021/issues/13#issuecomment-986137568), we have added narratives along with other changes to our final report: https://github.com/UBC-MDS/abalone_age_classification/commit/072cbb564bb1a00394992ca7bc5f938af11492e7

lynnwbl commented 2 years ago

Thank you for the comments! I really appreciate your feedback.

Regarding comment number 2 in this issue - https://github.com/UBC-MDS/data-analysis-review-2021/issues/13#issuecomment-986139483, I have added some description about Infant class in this commit, check it out here: https://github.com/UBC-MDS/abalone_age_classification/commit/52672be2cb2d6445458580d2e763996e5a833e58

kphaterp commented 2 years ago

Thank you so much for the feedback everyone!

Regarding comment 1) in this review, I have addressed this concern about the the age of the data in this commit: UBC-MDS/abalone_age_classification@8fc5463

Regarding comment 5 in this review, I have addressed this concern about the hyperlinks in the contributing file in this commit: UBC-MDS/abalone_age_classification@f0dc10ed

Thank you for pointing these concerns out to us!

veerupandey commented 2 years ago

Thank you very much for your feedback!

Regarding comment 1) in this review, nohup is a widely used Unix command to run a job in the background and log the output in a file. We have made reference to the runner.log in the project README file.

Regarding comment 2) in this review, the data folder was added inside src to organize all the scripts dealing with raw data in a single directory. This convention has been followed by many real-world projects. The famous cookiecutter project also creates the data directory inside the src folder for data-related scripts.

Regarding the comment made in this review, images are generated dynamically with the script run and added to the project report. We realized that the image was not getting generated properly on the windows operating system. Thanks for reporting this error, we have updated the usage instructions for Windows OS with this commit: UBC-MDS/abalone_age_classification@83b747a.

Thanks again for your constructive feedback!

UBC-MDS / data-analysis-review-2021