UBC-MDS / data-analysis-review-2023

0 stars 0 forks source link

Submission: GROUP 22: Red Wine White Wine Type Classifier #15

Open lichunubc opened 10 months ago

lichunubc commented 10 months ago

Submitting authors: <jinyz8888> <jcairn02> <chrisgqy> <lichunubc>

Repository: https://github.com/UBC-MDS/2023-DSCI522-Group22 Report link: https://github.com/UBC-MDS/2023-DSCI522-Group22/blob/main/report/wine_color_classification_report.ipynb

Disclaimer: We would like to bring to your attention that the final report for our group project is currently presented as a Jupyter Notebook file and not as a published Jupyter Book HTML file. The HTML which can be locally rendered can be found under /report/_build/html. We are actively working to publish the rendered HTML version on the appropriate platform and consider this task a top priority for our group.

We sincerely apologize for any inconvenience this may cause and appreciate your understanding and patience as we finalize the publication process.

Abstract/executive summary: Our analysis aimed to develop a predictive model to distinguish between red and white wines based on various physicochemical properties. This study employed logistic regression, a model renowned for its balance between predictive power and interpretability. The regression result suggested that residual sugar and total sulfur dioxide had high positive coefficients, indicating a strong association with white wine, whereas density showed the most substantial negative impact, followed by alcohol and volatile acidity, suggesting these are key indicators of red wine.

Editor: @lichunubc Reviewer: @farrandi, @MoNorouzi23, Arturo Rey Hagga, Paolo De Lagrave-Codina,

farrandi commented 10 months ago

Data analysis review checklist

Reviewer: @farrandi

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2

Review Comments:

The link to the report should is not a pdf or HTML. You provided a link to the ipynb file which can be viewed but I suggest making a GitHub page and linking the HTML file as the final report.

You have the LICENSE.md file but do not have the Creative Commons license in the file. (I also do not have this in my group, and it was not made clear on how to do this but I think it is just adding some more things to your LICENSE.md file).

I tried cloning your repo and following the steps in your readme to run the analysis but was unable to run docker compose up because there was no compose.yml file in the root directory. Maybe you misplaced it?

The figures shown in the report are nice and clear. I like the EDAs that you did along with the colour scheme used to match red and white wine. However, a thing to note is that for the histograms it might be better to unstack the histograms as we are comparing values between the two groups and not the sum of the values. Another thing to note is that the figure to show the coefficients of the model would be clearer if you sorted the coefficients in either ascending or descending order. A final thing on the figures is that the figure labels do not seem to be rendering correctly (both in the ipynb preview and the local HTML file). Maybe you could look into that.

Nit:

Overall, I learned a great deal from your analysis and your test scores are really good (compared to mine and a lot of other groups). I think you did a great job determining which features were important in determining red or white wine.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

MoNorouzi23 commented 10 months ago

Data analysis review checklist

Reviewer: @MoNorouzi23

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2 hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Your project showcases a commendable example of well-organized classification, offering valuable insights. I'd like to highlight the positive aspects while suggesting areas for improvement.

Positive points:

Areas for improvement:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

paolocodina commented 10 months ago

Data analysis review checklist

Reviewer: @paolocodina

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 hours

Review Comments:

Good: The question and report are easy to follow and the analysis is clear and concise.

The goal of the analysis and method used is easy to understand and interpret.

I like how the document has a section for adding new dependencies in case individuals trying to run the project need it.

Room for improvement: The html file on my computer was not showing some of the plots.

I found the explanation for the tests to be vague in both the readme and the actual test scripts.

The container did not work when I tried to build it.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

AReyH commented 10 months ago

Data analysis review checklist

Reviewer: AReyH

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2

Review Comments:

  1. The introduction and problem statement was very well written, and left space for no further questions.
  2. The EDA was extensive and thorough. The color palette the group chose was visually pleasing.
  3. The file names were inconsistent. In the src folder we see some files being named helper_, and test_ or tests_ and others having a different name convention.
  4. I could not build the container, since the repo is lacking the docker-compose.yml file.
  5. The report is missing but the team has noted that in their peer review.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.