Open juliaeveritt opened 7 months ago
from ucimlrepo import fetch_ucirepo
and data = fetch_ucirepo(id=109)
, so that we are able to read the raw data without running the codesThis was derived from the JOSE review checklist and the ROpenSci review checklist.
Strength:
Suggestions:
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Overall it is an awesome project. The model built by the team has an excellent performance of accuracy score of 0.96 and a F1 score of 0.96. Below is something I think it would be nice to include:
Dependencies section on README clearly listed dependencies required for running the project within the local environment using environment.yml. As there is another option, the docker container, created to run this project, it would be better to include the dependencies on dockerfile in this section.
Main report clearly includes all references related to the project and figures are properly generated on the report. It would be nice if the report can also include captions for each figure, as well as the figure number for reference within the analysis, which would be easier for users to follow the analysis. Also, hiding the code cells on the main report may reduce distractions from the main analysis that the authors want to deliver.
Looks like the project includes hyperparameter tuning as it includes a line chart of accuracy vs hyperparameter C for training set and validation set. May be more comments on what value of hyperparameter C is chosen for the best model would help the audience better understand the model chosen for this project.
I wasn’t able to reproduce the report in the container. I successfully opened up the container using docker compose up
of setup option 1. However, when I tried to run multiple scripts listed in the data analysis section and tests using pytest, I encountered an error of a module not found. I wonder if the dockerfile copies dependencies properly from the environment.yml file or my user mistakes. I may need some help from the project team to guide me how to run these scripts properly.
Overall it is a great project!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Generally, it is great idea to use machine learning to provide insights for problems that require high level of expertise. The software setup process is smoothe without error, and I really enjoyed reading the report. Below is the specific things I would like to mention:
I think the illustrations are quite intuitive in terms of revealing the distributions of the features. Formatting all features in one page facilitates the comparison across different features. From my perspective, it would be better if the figures can be on a larger scale so that the x-axis and y-axis are more clear, provided that they vary from feature to feature.
The code chunks give me a general idea about the packages that are necessary in this report, but I think hiding them can better highlights the main idea without interrupting the flow of the report.
It seems to me that the report does not mention data processing explicitly, so I assume that the data is quite ready upon download. I guess it would be better if there is more information about data quality, for example, if there is any missing value, and the statistics of the features like mean and standard deviation.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @sean-m-mckay @hbandukw @yimengxia @juliaeveritt
Repository: https://github.com/UBC-MDS/wine-origin-prediction/tree/main Report link: https://ubc-mds.github.io/wine-origin-prediction/wine_classification_report.html Abstract/executive summary: https://github.com/UBC-MDS/wine-origin-prediction/blob/main/README.md
Editor: @ttimbers Reviewer: