Submission: <GROUP 16: portugal_white_wine_quality_predictor>

Submitting authors: <@jokittipong> <@sho-i98> <@Nicole-Tu97>

Repository: https://github.com/UBC-MDS/portugal_white_wine_quality_predictor_py Report link: https://rawcdn.githack.com/UBC-MDS/portugal_white_wine_quality_predictor_py/8f098a7da456a3dcbe0863817da5203760776339/report/_build/_page/portugal_white_wine_quality_predictor_report/html/portugal_white_wine_quality_predictor_report.html Abstract/executive summary: We tried to make the classification model using the Polynomial Regression with Ridge Regularization algorithm with Randomized Search Hyperparameters which can predict Portugal white wine quality rating (on scale 0-10) through the physicochemical properties of the test wine. The model has trained on the Portugal white wine data set with 4898 observations. In the conclusion, the model performance is not quite good enough both on training data and on an unseen test data set with the test score at around 0.32 with the average train at 0.36 and the average test at 0.33 also with high root MSE and MSE (Mean Squared Error).

The reason we suspect the model cannot predict well is that the wine quality can be judge widely and vary depends on each individual preference taste. Moreover, there is no standard on the taste, for example, high or low in acidity or alcohol level or sulfur level cannot indicate the wine is in good quality or not (It can be both ways!!). As such, we believe this model is at, or close to, the starter required for studying further and could run more collected data to analyze the combination of physicochemical properties which will announce quality of the wine, although more researches need to improve the model performance and understand the characteristics of incorrectly predicted pattern would be in need to investigate further.

This data set used in this project is related to white vinho verde wine samples from the north of Portugal created By P. Cortez, A. Cerdeira, Fernando Almeida, Telmo Matos, J. Reis. 2009. The dataset was sourced from website for downloading these datasets is the UC Irvine Machine Learning Repository (https://archive.ics.uci.edu/dataset/186/wine+quality). In addition, these datasets stored the physicochemical properties data on wines and the quality rating to compare and make the quality prediction model.

Editor: @jokittipong Reviewer: <@alanpow> <@joeywwwu> <@srfrew> <@jinyz8888>

[x] I agree to abide by MDS's Code of Conduct during the review process and in maintaining my package should it be accepted.

Data analysis review checklist

Reviewer: jinyz8888

Conflict of interest

[X] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[X] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[X] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[X] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[X] Installation instructions: Is there a clearly stated list of dependencies?
[X] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[X] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[X] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[X] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[X] Style guidelides: Does the code adhere to well known language style guides?
[X] Modularity: Is the code suitably abstracted into scripts and functions?
[X] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[X] Data: Is the raw data archived somewhere? Is it accessible?
[X] Computational methods: Is all the source code required for the data analysis available?
[X] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[X] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[X] Authors: Does the report include a list of authors with their affiliations?
[X] What is the question: Do the authors clearly state the research question being asked?
[X] Importance: Do the authors clearly state the importance for this research question?
[X] Background: Do the authors provide sufficient background information so that readers can understand the report?
[X] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[X] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[X] Conclusions: Are the conclusions presented by the authors correct?
[X] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[X] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5 hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Pretty good report. It is better to: 1) Remove numbers at the beginning of the report 2) Optimize the structure of the repository. For example, remove .cache/matplotlib if possible 3) Optimize the model in the future.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Data analysis review checklist

Reviewer: @srfrew

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[X] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[ ] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[ ] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[ ] Importance: Do the authors clearly state the importance for this research question?
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1

Review Comments:

I noticed that the LICENSE file had incorrect names attributing copyright on lines 20 and 37
Overall, I found the report to be very thorough and descriptive, identifying rationale and clearly displaying EDA results, model findings, and results. I liked how you had a comprehensive EDA and visualized your model as well! This seemed like a challenging problem with tons of features - great work!!
The results section could have had more detail on next steps or improvements given your conclusions. I found that some of the writing had a couple typos as well! I didn't find a clear section on how you were motivating this report - was it seeking to compare objectivity between reviewers?

I wasn't able to run the tests using pytest in the Docker compose by running pytest in the root directory of the folder as directed by the README. This may be due to files being named as test-* instead of test_*. Further, I think test_and_deploy.py is being called by pytest due to the name matching, but this isn't a test script!


(base) jovyan@3c0faa4c6621:~/work$ pytest
============================= test session starts =============================
platform linux -- Python 3.11.6, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/jovyan/work
plugins: anyio-4.0.0
collected 0 items

============================== warnings summary =============================== ../../../opt/conda/lib/python3.11/site-packages/click/core.py:1155 /opt/conda/lib/python3.11/site-packages/click/core.py:1155: PytestCollectionWarning: cannot collect 'test_and_deploy' because it is not a function. def call(self, *args: t.Any, **kwargs: t.Any) -> t.Any:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================= 1 warning in 0.77s ==============================

- I unfortunately wasn't able to run `eda.py` in Docker Compose, it threw the following error.

/opt/conda/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): Traceback (most recent call last): File "/home/jovyan/work/script/eda.py", line 91, in eda_script() File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/work/script/eda.py", line 79, in eda_script sns.histplot(white_train[column], kde=True, color='pink') File "/opt/conda/lib/python3.11/site-packages/seaborn/distributions.py", line 1438, in histplot p.plot_univariate_histogram( File "/opt/conda/lib/python3.11/site-packages/seaborn/distributions.py", line 431, in plot_univariate_histogram all_data = self.comp_data.dropna() ^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/seaborn/_oldcore.py", line 1119, in comp_data with pd.option_context('mode.use_inf_as_null', True): File "/opt/conda/lib/python3.11/site-packages/pandas/_config/config.py", line 478, in enter self.undo = [(pat, _get_option(pat)) for pat, val in self.ops] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/pandas/_config/config.py", line 478, in self.undo = [(pat, _get_option(pat)) for pat, val in self.ops] ^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/pandas/_config/config.py", line 146, in _get_option key = _get_single_key(pat, silent) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/pandas/_config/config.py", line 132, in _get_single_key raise OptionError(f"No such keys(s): {repr(pat)}") pandas._config.config.OptionError: No such keys(s): 'mode.use_inf_as_null'


- In running `fit_polynomial_regression.py` I noticed a significant amount of errors thrown by ridge around matrix singularity. This may indicate significant multicollinearity or identical columns, and removing duplicate features or implementing lasso regression to select features may help with this!

#### Attribution

This was derived from the [JOSE review checklist](https://openjournals.readthedocs.io/en/jose/review_checklist.html) and the ROpenSci review checklist.

Data analysis review checklist

Reviewer:

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[x] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[x] Importance: Do the authors clearly state the importance for this research question?
[ ] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1h

Review Comments:

Strengths

The report is well-structured, providing a clear overview, introduction, methods, discussion, results, and references, which makes it easy to follow.
The use of Polynomial Regression with Ridge Regularization and Randomized Search for Hyperparameters is well-explained.
The project discusses the limitations of their model and suggests future research directions, demonstrating a critical understanding of their work. Areas for Improvement
The model's performance is moderate (test score around 0.32), if possible, suggesting experimentation with ensemble methods that combine predictions from multiple models to improve accuracy and robustness.
The report acknowledges the subjectivity in wine quality assessment but does not provide any suggestions or proposals on how to address this challenge in their modeling approach.
If possible, work with wine experts or sommeliers to gain insights that could influence feature selection or model interpretation. Their expertise may provide valuable context that is not apparent from the data alone.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Data analysis review checklist

Reviewer: Alanpow

Conflict of interest

[X] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[X] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[X] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[X] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[X] Installation instructions: Is there a clearly stated list of dependencies?
[X] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[X] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[X] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[X] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[X] Style guidelides: Does the code adhere to well known language style guides?
[X] Modularity: Is the code suitably abstracted into scripts and functions?
[ ] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[X] Data: Is the raw data archived somewhere? Is it accessible?
[X] Computational methods: Is all the source code required for the data analysis available? YES BUT IN SCRIPTS NOT SOURCE
[X] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[ ] Automation: Can someone other than the authors easily reproduce the entire data analysis? THE SCRIPTS TO CREATE THE ANALYSIS ERROR OUT

Analysis report

[X] Authors: Does the report include a list of authors with their affiliations?
[X] What is the question: Do the authors clearly state the research question being asked?
[ ] Importance: Do the authors clearly state the importance for this research question?
[X] Background: Do the authors provide sufficient background information so that readers can understand the report?
[X] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[ ] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[ ] Conclusions: Are the conclusions presented by the authors correct?
[X] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[X] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5 Hours

Review Comments:

The report format is a little strange, the first section under the title displays 5 boxes with numbers. Not very sure what the purpose us but it does not look good and and I believe it was left in by accident. I would also include a little more information in the methods section explaining what polynomial regression is, maybe include a formula or mathematical equation to show what the model is made of - because right now it is explained that it is better than linear regression because it can pick up on more complex models - this is true but it does not really inform a reader what a polynomial regression model is and the nuance behind it. I would also tie the results of the report back to the over all report question, and have a formal conclusion - it looks like the results section only really referenced the model scores but no interpretation of the final model was made nor any interpretation of the results for the given problem at hand.
The report does not include much information as to why the question being explored is important and what the motivation is behind wanting to explore/predict wine ratings of Portuguese wine. You guys did a great job giving background information as to what aspects of wine contribute to ratings and what merits at good wine, however, there is some reasoning or drive behind the report holistically. Maybe attached it to a business angle to explore how these ratings could help sell wine, or how we could maximize wine rating to maximize profit.
I was able to run the docker file and start running your scripts to reproduce your analysis, however, every script I ran after ingesting the raw data errored out - I would also recommend fixing up the ReadMe file, specifically the instructions for running the scripts, because it is written as regular text rather than inline code that is easier to follow. This is the script which lead to the errors: 522 Peer Review: Script Commands Failing: (base) jovyan@73c381c0b26d:~/work$ python script/eda.py data/Processed/white_train.csv produced the head of the data set as well as the correlation matrix, however, afterwards I was left with errors: `[12 rows x 12 columns] /opt/conda/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): Traceback (most recent call last): File "/home/jovyan/work/script/eda.py", line 91, in eda_script() File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/work/script/eda.py", line 79, in eda_script sns.histplot(white_train[column], kde=True, color='pink') File "/opt/conda/lib/python3.11/site-packages/seaborn/distributions.py", line 1438, in histplot p.plot_univariate_histogram( File "/opt/conda/lib/python3.11/site-packages/seaborn/distributions.py", line 431, in plot_univariate_histogram all_data = self.comp_data.dropna() ^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/seaborn/_oldcore.py", line 1119, in comp_data with pd.option_context('mode.use_inf_as_null', True): File "/opt/conda/lib/python3.11/site-packages/pandas/_config/config.py", line 478, in enter self.undo = [(pat, _get_option(pat)) for pat, val in self.ops] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/pandas/_config/config.py", line 478, in self.undo = [(pat, _get_option(pat)) for pat, val in self.ops] ^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/pandas/_config/config.py", line 146, in _get_option key = _get_single_key(pat, silent) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/pandas/_config/config.py", line 132, in _get_single_key raise OptionError(f"No such keys(s): {repr(pat)}") pandas._config.config.OptionError: No such keys(s): 'mode.use_inf_as_null' (base) jovyan@73c381c0b26d:~/work$ python script/fit_polynomial_regression.py data/Processed/white_train.csv data/Processed/white_test.csv /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:200: LinAlgWarning: Ill-conditioned matrix (rcond=7.94006e-17): result may not be accurate. return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:200: LinAlgWarning: Ill-conditioned matrix (rcond=7.61806e-17): result may not be accurate. return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:200: LinAlgWarning: Ill-conditioned matrix (rcond=7.86797e-17): result may not be accurate. return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:200: LinAlgWarning: Ill-conditioned matrix (rcond=7.68219e-17): result may not be accurate. return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:200: LinAlgWarning: Ill-conditioned matrix (rcond=9.65767e-17): result may not be accurate. return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:200: LinAlgWarning: Ill-conditioned matrix (rcond=5.87782e-22): result may not be accurate. return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:239: UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead. warnings.warn( /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:239: UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead. warnings.warn( /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:239: UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead. warnings.warn( /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:239: UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead. warnings.warn( /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:239: UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead. warnings.warn( /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:200: LinAlgWarning: Ill-conditioned matrix (rcond=5.85444e-22): result may not be accurate. return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:200: LinAlgWarning: Ill-conditioned matrix (rcond=1.16842e-21): result may not be accurate. return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T /opt/conda/lib/python3.11/site-packages/sklearn/linear_model/_ridge.py:200: LinAlgWarning: Ill-conditioned matrix (rcond=6.18235e-22): result may not be accurate. return linalg.solve(A, Xy, assume_a="pos", overwrite_a=True).T Traceback (most recent call last): File "/home/jovyan/work/script/fit_polynomial_regression.py", line 80, in polynomial_regression() File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/work/script/fit_polynomial_regression.py", line 67, in polynomial_regression random_search.fit(x_train_w, y_train_w) File "/opt/conda/lib/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper return fit_method(estimator, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/sklearn/model_selection/_search.py", line 898, in fit self._run_search(evaluate_candidates) File "/opt/conda/lib/python3.11/site-packages/sklearn/model_selection/_search.py", line 1809, in _run_search evaluate_candidates( File "/opt/conda/lib/python3.11/site-packages/sklearn/model_selection/_search.py", line 845, in evaluate_candidates out = parallel( ^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/sklearn/utils/parallel.py", line 65, in call return super().call(iterable_with_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/joblib/parallel.py", line 1952, in call return output if self.return_generator else list(output) ^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/joblib/parallel.py", line 1595, in _get_outputs yield from self._retrieve() File "/opt/conda/lib/python3.11/site-packages/joblib/parallel.py", line 1699, in _retrieve self._raise_error_fast() File "/opt/conda/lib/python3.11/site-packages/joblib/parallel.py", line 1734, in _raise_error_fast error_job.get_result(self.timeout) File "/opt/conda/lib/python3.11/site-packages/joblib/parallel.py", line 736, in get_result return self._return_or_raise() ^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/joblib/parallel.py", line 754, in _return_or_raise raise self._result joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGKILL(-9)} (base) jovyan@73c381c0b26d:~/work$ `

Look into this because it is erroring out as well as timing out.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

UBC-MDS / data-analysis-review-2023