DSCI-310-2024 / data-analysis-review-2024

2 stars 0 forks source link

Submission: Group 15: Wildfire Predictor #15

Open ttimbers opened 6 months ago

ttimbers commented 6 months ago

Submitting authors: ahul Brar, Fiona Chang, Lillian Milroy, Darwin Zhang

Repository: https://github.com/DSCI-310-2024/dsci310-group-wildfire-predictor/releases/tag/Milestone-3

Abstract/executive summary:

In this analysis, we train a linear regression model capable of predicting wildfire intensity, which is measured by the geographic area affected by fires. The trained model performed well when making predictions on unseen data, exhibiting an RMSE of 72.954 and a R-squared score of 0.948.

We used data about Australian wildfires collected using thermal imaging technology and processed by IBM (Hamann and Schmude, 2021). The data was sourced from GitHub, and the specific csv we used can be accessed here (Krook 2021). Each row in the dataset represents a day's worth of information about the number, spread, and intensity of fires within one of seven regions in Australia, dating back to 2005.

Editor: @ttimbers

Reviewer: Amar Gill, Riddhi Battu, Lucas Liu, Sid Ahuja

agill59 commented 6 months ago

Data analysis review checklist (WIP)

Reviewer: agill59

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1

Review Comments:

Environment file contains python.app which was not available for me on the channels I had access to, had to comment it out to create environment.

Data folder can be cleaned up (Has empty dirs).

Test suite for test_relevant_features.py could be more robust.

Docker commands given in readme do not work.

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

sidahuja1 commented 5 months ago

Data analysis review checklist

Reviewer: sidahuja1

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1

Review Comments:

While the documentation regarding the environment and docker set up was clear, I wasn't able to use make all or make clean when using docker-compose.

Test folder documentation can also be added to root folder README file

test_relevant_features.py can be more comprehensive

Some functions in src don't have documentation describing what they do

I tried rendering the qmd notebook via RStudio and could not see the images. It might be a good idea to include rendered html or pdf in the reports folder

Table in report requires a subtitle

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

SugarLucas commented 5 months ago

Data analysis review checklist(wildfire-predictor)

Reviewer: SugarLucas

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5

Review Comments:

  1. Should include reference in the report.qmd file although it was provide in the ipynb file under the src folder.
  2. Pytest was not included in the environment.yml file as well as the dockerfile, which causes troubles when running the tests.
  3. Some functions inside the src folder does not have proper documentation about parameter, return and examples
  4. It would be nice to add the usage of a function in the src directory in the prepocessing.py and download_data.py script.
  5. In the readme section, the docker compose run make all command from my local end shows error though the make clean command is able to run

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

riddhibattu commented 5 months ago

Data analysis review checklist

Reviewer: riddhibattu

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 Hours

Review Comments:

General Observations
Your project on wildfire prediction presents a significant and timely analysis. The choice of dataset and the methodology applied demonstrate a thoughtful approach to an important environmental issue. Below, I offer constructive feedback aimed at enhancing the clarity, reproducibility, and overall impact of your work.

Technical and Documentation Improvements

  1. Jupyter Notebook Execution: The README.md file links to the wildfire-prediction.ipynb notebook, which displays an error related to the 'os' module and exhibits non-sequential execution (e.g., jumping from In [1] to In [28]). It is critical to restart and run all cells sequentially to ensure reproducibility and coherence for readers. Despite this, the analysis proceeds as expected when run manually via Jupyter Lab.

  2. Reference Documentation: I noticed several references (6 in total) lacking DOIs. Where DOIs are unavailable, including direct links to the references could enhance the report's credibility and utility.

  3. Report Accessibility: Providing the final report in PDF or HTML format, in addition to the Jupyter notebook, would greatly improve accessibility and readability.

  4. Build Commands: The make clean and make all commands did not execute successfully as per the instructions in the README.md. This issue might hinder the reproducibility of the analysis environment.

  5. Code Optimization: There are instances of unused package imports within the code. Streamlining these imports to include only necessary packages would enhance the code's efficiency and readability.

  6. Data Presentation: For the correlation matrix, consider using more descriptive names rather than abbreviations with underscores to improve readability and interpretation.

  7. Visualization Clarity: The correlation_matrix.png is partially cut off. Adjusting the image's dimensions could ensure the entire matrix is visible and interpretable.

  8. Quarto Document Rendering: Manual rendering of the QMD to PDF revealed issues with image display. Ensuring images render correctly in all document formats would greatly enhance the presentation quality.

  9. Navigability: Adding hyperlinks to tables within the Quarto document would improve navigability and reader understanding, especially when referencing specific data.

Errors and Solutions

Encountered errors related to file not found (404: Not Found) for several figures and the report PDF. These errors suggest issues with file paths or rendering processes. Ensuring accurate path references and successful rendering in both HTML and PDF formats would resolve these visibility issues.

  1. Test Data Clarification: An ambiguously named empty.zip was found within the tests directory. Renaming this to something more descriptive, such as test_data.zip, would clarify its purpose.

  2. Testing Documentation: Including specific instructions on how to run the tests would aid in validating the project's reliability and functionality.

Technical Specifications

  1. Dockerfile Versioning: The Dockerfile lacks specific versioning for 'make'. Specifying version numbers could prevent compatibility issues and ensure consistent environment replication.

  2. Environment Management: Similar to the Dockerfile, the environment.yml file would benefit from including specific package versions to ensure consistent, reproducible analysis environments across different setups.

Closing Thoughts

Overall, your project demonstrates a commendable effort in addressing a critical environmental concern. The analysis is well-conceived, and with the suggested improvements, its impact and accessibility could be significantly enhanced. I look forward to seeing the continued development of this important work.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.