Open jokittipong opened 7 months ago
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Pretty good report. It is better to: 1) Remove numbers at the beginning of the report 2) Optimize the structure of the repository. For example, remove .cache/matplotlib if possible 3) Optimize the model in the future.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
pytest
in the Docker compose by running pytest
in the root directory of the folder as directed by the README. This may be due to files being named as test-*
instead of test_*
. Further, I think test_and_deploy.py
is being called by pytest
due to the name matching, but this isn't a test script!
(base) jovyan@3c0faa4c6621:~/work$ pytest
============================= test session starts =============================
platform linux -- Python 3.11.6, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/jovyan/work
plugins: anyio-4.0.0
collected 0 items
============================== warnings summary =============================== ../../../opt/conda/lib/python3.11/site-packages/click/core.py:1155 /opt/conda/lib/python3.11/site-packages/click/core.py:1155: PytestCollectionWarning: cannot collect 'test_and_deploy' because it is not a function. def call(self, *args: t.Any, **kwargs: t.Any) -> t.Any:
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================= 1 warning in 0.77s ==============================
- I unfortunately wasn't able to run `eda.py` in Docker Compose, it threw the following error.
/opt/conda/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
if pd.api.types.is_categorical_dtype(vector):
Traceback (most recent call last):
File "/home/jovyan/work/script/eda.py", line 91, in
- In running `fit_polynomial_regression.py` I noticed a significant amount of errors thrown by ridge around matrix singularity. This may indicate significant multicollinearity or identical columns, and removing duplicate features or implementing lasso regression to select features may help with this!
#### Attribution
This was derived from the [JOSE review checklist](https://openjournals.readthedocs.io/en/jose/review_checklist.html) and the ROpenSci review checklist.
Strengths
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
The report format is a little strange, the first section under the title displays 5 boxes with numbers. Not very sure what the purpose us but it does not look good and and I believe it was left in by accident. I would also include a little more information in the methods section explaining what polynomial regression is, maybe include a formula or mathematical equation to show what the model is made of - because right now it is explained that it is better than linear regression because it can pick up on more complex models - this is true but it does not really inform a reader what a polynomial regression model is and the nuance behind it. I would also tie the results of the report back to the over all report question, and have a formal conclusion - it looks like the results section only really referenced the model scores but no interpretation of the final model was made nor any interpretation of the results for the given problem at hand.
The report does not include much information as to why the question being explored is important and what the motivation is behind wanting to explore/predict wine ratings of Portuguese wine. You guys did a great job giving background information as to what aspects of wine contribute to ratings and what merits at good wine, however, there is some reasoning or drive behind the report holistically. Maybe attached it to a business angle to explore how these ratings could help sell wine, or how we could maximize wine rating to maximize profit.
I was able to run the docker file and start running your scripts to reproduce your analysis, however, every script I ran after ingesting the raw data errored out - I would also recommend fixing up the ReadMe file, specifically the instructions for running the scripts, because it is written as regular text rather than inline code that is easier to follow. This is the script which lead to the errors: 522 Peer Review: Script Commands Failing: (base) jovyan@73c381c0b26d:~/work$ python script/eda.py data/Processed/white_train.csv
produced the head of the data set as well as the correlation matrix, however, afterwards I was left with errors: `[12 rows x 12 columns]
/opt/conda/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
if pd.api.types.is_categorical_dtype(vector):
Traceback (most recent call last):
File "/home/jovyan/work/script/eda.py", line 91, in
The exit codes of the workers are {SIGKILL(-9)} (base) jovyan@73c381c0b26d:~/work$ `
Look into this because it is erroring out as well as timing out.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: <@jokittipong> <@sho-i98> <@Nicole-Tu97>
Repository: https://github.com/UBC-MDS/portugal_white_wine_quality_predictor_py Report link: https://rawcdn.githack.com/UBC-MDS/portugal_white_wine_quality_predictor_py/8f098a7da456a3dcbe0863817da5203760776339/report/_build/_page/portugal_white_wine_quality_predictor_report/html/portugal_white_wine_quality_predictor_report.html Abstract/executive summary: We tried to make the classification model using the Polynomial Regression with Ridge Regularization algorithm with Randomized Search Hyperparameters which can predict Portugal white wine quality rating (on scale 0-10) through the physicochemical properties of the test wine. The model has trained on the Portugal white wine data set with 4898 observations. In the conclusion, the model performance is not quite good enough both on training data and on an unseen test data set with the test score at around 0.32 with the average train at 0.36 and the average test at 0.33 also with high root MSE and MSE (Mean Squared Error).
The reason we suspect the model cannot predict well is that the wine quality can be judge widely and vary depends on each individual preference taste. Moreover, there is no standard on the taste, for example, high or low in acidity or alcohol level or sulfur level cannot indicate the wine is in good quality or not (It can be both ways!!). As such, we believe this model is at, or close to, the starter required for studying further and could run more collected data to analyze the combination of physicochemical properties which will announce quality of the wine, although more researches need to improve the model performance and understand the characteristics of incorrectly predicted pattern would be in need to investigate further.
This data set used in this project is related to white vinho verde wine samples from the north of Portugal created By P. Cortez, A. Cerdeira, Fernando Almeida, Telmo Matos, J. Reis. 2009. The dataset was sourced from website for downloading these datasets is the UC Irvine Machine Learning Repository (https://archive.ics.uci.edu/dataset/186/wine+quality). In addition, these datasets stored the physicochemical properties data on wines and the quality rating to compare and make the quality prediction model.
Editor: @jokittipong Reviewer: <@alanpow> <@joeywwwu> <@srfrew> <@jinyz8888>