UBC-MDS / data-analysis-review-2023

0 stars 0 forks source link

Submission: Group_09: RaincouverPrediction #7

Open dorisyycai opened 7 months ago

dorisyycai commented 7 months ago

Submitting authors: @wqxxzd @dorisyycai @yhan178 @sivakornchong

Repository: https://github.com/UBC-MDS/RaincouverPrediction Report link: https://ubc-mds.github.io/RaincouverPrediction/raincouver_prediction_report3.html Abstract/executive summary: Our project investigates the prediction of daily precipitation in Vancouver using machine learning methods. Using a dataset spanning from 1990 to 2023, we explored the predictive power of some key environmental and cliamte features such as temperature, wind speed, and evapotranspiration. Our results suggest the best classification model is Support Vector Machine with Radial Basis Function (SVM RBF) model with the hyperparameter C=10.0. The model achieved a notable F1 score of 0.87 on the positive class (precipitation is present) when generalized to the unseen data, suggesting a high accuracy in precipitation prediction. We also explored feature importance, showing ET₀ reference evapotranspiration and the cosine transformation of months as robust predictors. Hyperparameter optimization did not make improvement to our curren model, indicating the potential need for feature engineering or incoportating more features. Our preject presents a reliable model for predicting precipitation with potential practical applications in various fields.

Editor: @ttimbers Reviewer: Sharon Voon, Anu Banga, Jenny Lee, Alysen Townsley

s-voon commented 7 months ago

Data analysis review checklist

Reviewer: s-voon

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

1) Looking very good overall. Just a few typos in the report which can be avoided such as (cliamte (Summary), curren (Summary), cylical (Section 2.3)) 2) Maybe include why you chose f1 score as the matrix or what does f1 score represents? In Fig 5, you included the values for recall, precision, and support but did not mention any of them in your report except for f1-score which makes the respective columns redundant in my opinion. 3) You included an environment file in your root repo but there is no instruction for how to use it, so I would suggest you to move it to maybe archive folder (?) as the dependencies included in the environment file seems to exclude click, jupyter-book and so on. Well done~~

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

AlysenTownsley commented 7 months ago

Data analysis review checklist

Reviewer: @AlysenTownsley

Conflict of interest

Code of Conduct

Comment are general comments; FIX means that this is an issue which should (if you are so inclined) fix in the Milestone 4 submission.

General checks

Is the repository well organized and easy to navigate?

FIX: This project uses the MIT license which protects code only. A Creative Commons license should be used as well to cover the report / writing section of the repo.

Documentation

1) Contribute to the software

2) Report issues or problems with the software

3) Seek support

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

1.5

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

AnuBanga commented 7 months ago

Data analysis review checklist

Reviewer: @AnuBanga

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Attribution

Hi team, your analysis was engaging and a pleasure to review. The report is comprehensive and clear, offering sufficient information to grasp the analysis, supported by clear justifications for your methodologies.

Here are few suggestions you might want to consider:

  1. There are similar files for different milestones in notebooks and reports folder. I'm uncertain which one to select for review. Would it be possible to tidy up the structure by removing any unnecessary or redundant files?
  2. There are Spelling mistakes, e.g, cliamte , curren and incoportating, preject in project overview section.
  3. Resize Images in figure 3 and 5. x-axis in figure 3 is truncated and figure 5 need to resized.
  4. Consider adjusting the color scheme in the heatmap to enhance readability.
  5. After hyperparameter optimization, precision, recall, f1 score, and support were mentioned, they weren't displayed while selecting the model based on the f1 score. Consider including a matrix or list of these scoring metrics before finalizing the choice based on the f1 score.
  6. README.md - Add more references in README.md

This was derived from the JOSE review checklist and the ROpenSci review checklist.

jlee2843 commented 7 months ago

Data analysis review checklist

Reviewer: @jlee2843 Jenny Lee

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 Hour

Review Comments:

Potential improvements for Consideration:
Feedback for Appreciation

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.