Open dorisyycai opened 7 months ago
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
1) Looking very good overall. Just a few typos in the report which can be avoided such as (cliamte (Summary), curren (Summary), cylical (Section 2.3)) 2) Maybe include why you chose f1 score as the matrix or what does f1 score represents? In Fig 5, you included the values for recall, precision, and support but did not mention any of them in your report except for f1-score which makes the respective columns redundant in my opinion. 3) You included an environment file in your root repo but there is no instruction for how to use it, so I would suggest you to move it to maybe archive folder (?) as the dependencies included in the environment file seems to exclude click, jupyter-book and so on. Well done~~
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Comment are general comments; FIX means that this is an issue which should (if you are so inclined) fix in the Milestone 4 submission.
[x] Repository: Is the source code for this data analysis available?
Comment: The source code is available – scripts are in scripts and ipynb file is in the reports directory.
FIX: Is the notebooks directory required? The report will be rendered by the jupyter book in reports, so perhaps this folder can be removed?
Is the repository well organized and easy to navigate?
Comment: The repo is well organized in a logical way I.e. data, docs, notebooks, reports, results, src, test. The standalone files are correct I.e. .gitignore, contributing, Dockerfile, license, readme.md, code of conduct, docker-compose, and the environment.yaml file.
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
FIX: This project uses the MIT license which protects code only. A Creative Commons license should be used as well to cover the report / writing section of the repo.
[x] Installation instructions: Is there a clearly stated list of dependencies?
Comment: Looks good! All dependencies are listed in the Dockerfile.
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
Comment: Yes, the authors list how to use the software in the Usage section. The set-up and analysis instructions are clear.
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
Comment: Yes, the functionality is documented and clear.
[x] Community guidelines: Are there clear guidelines for third parties wishing to:
1) Contribute to the software
2) Report issues or problems with the software
3) Seek support
FIX: It is not clear who to reach out to in the case that users have questions or need help. It doesn’t seem like Tiffany has this listed in her example repo either, but perhaps you could list a contact within the Readme of contributing file.
FIX: Your code_of_conduct.md references Tiffany’s email for issues. This should be changed to one of the project team members.
[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
Comment: Code is well documented with comments.
FIX: A full docstring (examples, input data types, etc.) might help users to understand how to use the scripts more clearly. This is relevant for the classification.py, drop_split_preprocess.py, and eda.py scripts.
[x] Style guidelines: Does the code adhere to well known language style guides?
Comment: As mentioned above, recommend to add full docstrings.
[x] Modularity: Is the code suitably abstracted into scripts and functions?
Comment: Yes. No code left in the report.
[x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?
Comment: Tests can be run from the terminal and are of excellent quality! Great work.
[x] Data: Is the raw data archived somewhere? Is it accessible?
Comment: Yes, in the data folder.
[x] Computational methods: Is all the source code required for the data analysis available?
Comment: Yes, all code is available and reproducible.
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
Comment: Required software is listed in the Dockerfile which is referenced in the readme. All required software can be loaded with Docker compose up.
[x] Automation: Can someone other than the authors easily reproduce the entire data analysis?
Comment: I was able to enter the docker container and reproduce the results from the JL terminal. Great work! Instructions are easy to follow. Tests, scripts and jupyter book are working. Do we also need to have instructions to reproduce the analysis in the virtual environment? Perhaps something to consider adding.
[x] Authors: Does the report include a list of authors with their affiliations?
Comment: Yes authors are listed. Affiliations are not listed (UBC).
[x] What is the question: Do the authors clearly state the research question being asked?
Comment: Yes: (Our project investigates the prediction of daily precipitation in Vancouver using machine learning methods. Using a dataset spanning from 1990 to 2023, we explored the predictive power of some key environmental and climate features such as temperature, wind speed, and evapotranspiration.)
[x] Importance: Do the authors clearly state the importance for this research question?
Comment: Yes, agriculture, water management, etc. Great topic!
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
Comment: Yes. This is a topic relatable to everyday people and the report is understandable.
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
Comment: Yes, ex. Transforming the cyclical weather data into a sine / cosine relationship.
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
Comment: Yes, in tabular form.
[x] Conclusions: Are the conclusions presented by the authors correct?
Comment: Yes, the conclusions seem to be correct.
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
FIX: All references should have a DOI:
[NTHJ01] https://doi.org/10.1002/joc.680.
[OGarciaSSCM14] 10.1016/j.atmosres.2014.01.012.
[x] Writing quality: Is the writing of good quality, concise, engaging?
FIX: There are a few grammatical errors in the report which could be fixed, ex/ “Hyperparameter optimization did not make improvement to our curren model, indicating the potential need for feature engineering or incoportating more features.” “The performace of each model is plotted below”.
FIX: Your fig.3 has the x-axis labels a bit cut off. Suggest to re-size the image slightly. Your fig. 5 has quite a lot of white space padding. Suggest to resize the image.
1.5
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Hi team, your analysis was engaging and a pleasure to review. The report is comprehensive and clear, offering sufficient information to grasp the analysis, supported by clear justifications for your methodologies.
Here are few suggestions you might want to consider:
This was derived from the JOSE review checklist and the ROpenSci review checklist.
README.md
. f1
score used as your accuracy metric? Have you considered using any other scores? glue
package. docker compose up
. This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @wqxxzd @dorisyycai @yhan178 @sivakornchong
Repository: https://github.com/UBC-MDS/RaincouverPrediction Report link: https://ubc-mds.github.io/RaincouverPrediction/raincouver_prediction_report3.html Abstract/executive summary: Our project investigates the prediction of daily precipitation in Vancouver using machine learning methods. Using a dataset spanning from 1990 to 2023, we explored the predictive power of some key environmental and cliamte features such as temperature, wind speed, and evapotranspiration. Our results suggest the best classification model is Support Vector Machine with Radial Basis Function (SVM RBF) model with the hyperparameter C=10.0. The model achieved a notable F1 score of 0.87 on the positive class (precipitation is present) when generalized to the unseen data, suggesting a high accuracy in precipitation prediction. We also explored feature importance, showing ET₀ reference evapotranspiration and the cosine transformation of months as robust predictors. Hyperparameter optimization did not make improvement to our curren model, indicating the potential need for feature engineering or incoportating more features. Our preject presents a reliable model for predicting precipitation with potential practical applications in various fields.
Editor: @ttimbers Reviewer: Sharon Voon, Anu Banga, Jenny Lee, Alysen Townsley