Open AlysenTownsley opened 11 months ago
1.5 hours
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
As shown in the image below, the src
directory only contains 1 file. In that case, all test cases trying to call the helpers from src/
would not be executed, as they will fail in the dependency import stage (most likely).
The test files could have more consistent naming standard, like test-word1_word2_word3.py
May consider referencing the environment.yaml
in the docker file, to enhance the code reusability.
For example:
FROM quay.io/jupyter/minimal-notebook:2023-11-22
WORKDIR /home/jovyan
COPY environment.yaml .
RUN conda env update --file environment.yaml
Despite some minor naming and importing issues, this project stands out for comprehensive documentation, code quality, reproducibility, and thorough analysis reporting.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
The report is very thorough and explains the analysis in great detail. However, it features many technical details that may be too complex to add to the understanding of the average reader. For instance, the section at the end of the introduction outlining which software packages were used in the analysis may not be meaningful to many readers who aren't technically trained in data science. It may be worth considering removing that section to keep the report concise and understandable to all readers.
In an effort to improve reproducibility, it may be worth adding edge cases to the function modules to make sure the functions are, for example, passed inputs of the correct data type, or are non-null, etc. One example of a useful edge case could be raising an error if the dataframe read in in the data_split.py script does not have the correct number of columns.
The repository is structured in a largely clear and easy-to-navigate manner. Though, I noticed that all but one of the scripts are stored in the scripts directory. It may be a good idea to save all the scripts in the scripts folder as it may reduce the chances of readers missing the read_view.py script (as I almost did).
This was derived from the JOSE review checklist and the ROpenSci review checklist.
1 hour
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Traceback (most recent call last):
File "/home/jovyan/work/scripts/test_set_deployment.py", line 72, in <module>
test_set_deployment()
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/scripts/test_set_deployment.py", line 58, in test_set_deployment
X_test = pd.read_csv((x_test_folder + 'x_test.csv'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 611, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__
self._engine = self._make_engine(f, self.engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
self.handles = get_handle(
^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/pandas/io/common.py", line 863, in get_handle
handle = open(
^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '../results/tables/x_test.csv'
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @alexzhang0825 @sungg888 @AlysenTownsley @nicolebid
Repository: https://github.com/UBC-MDS/Red-Wine-Quality-Prediction Report link: https://ubc-mds.github.io/Red-Wine-Quality-Prediction/ Abstract/executive summary: In this project our group seeks to use machine learning algorithms to predict wine quality (scale of 0 to 10) using physiochemical properties of the liquid. We use a train-test split and cross-validation to simulate the model encountering unseen data. We use and tune the parameters of several classification models: logistic regression, decision tree, kNN, and SVM RBF to see which one has the highest accuracy, and then deploy the winner onto the test set. The final test set accuracy is around 62 percent. Depending on the standard, this can be decent or poor. However, a more important note is that for the really extreme quality ones (below 5 or above 6), the model was unable to identify quite a few of them correctly, suggesting that it is not very robust to outliers. We include a final discussion section on some of the potential causes for this performance as well as proposed solutions for any future analysis.
Editor: @ttimbers Reviewer: Shizhe Zhang, Tony Shum, Jake Barnabe, Yiwei Zhang