UBC-MDS / data-analysis-review-2021

1 stars 4 forks source link

Submission: Group 2 Forest Fire Prediction #30

Open voremargot opened 2 years ago

voremargot commented 2 years ago

Submitting Authors: @voremargot , Hatef Rahmani, @gauthampughaz @Anahita97

Repo Link: https://github.com/UBC-MDS/forest-fire-area-prediction-group-2 Report Link: https://github.com/UBC-MDS/forest-fire-area-prediction-group-2/blob/dev/reports/Final_report.md

Summary: We have created a simple prediction model to predict the size of forest fires using weather and soil moisture properties. We explore a data set from northeastern Portugal that contains spatial features, temporal features, soil moisture indices, and weather features to predict the size of wildfires within the Montesinho natural park. We create a Support Vector Regression (SVR) model using the soil moisture variables, temperature, relative humidity, wind, spatial coordinates, and season. After removing outliers using Cook’s Distance method, we optimize our model using mean absolute area (MAE) and root mean square error (RMSE). Our optimized model, with C = 1.88 and γ = 0.48, produces a MAE of 8.686 and an RMSE of 28.46 on the unseen test data set, which is good for our area burned values which range from 0 to 1,090 ha.

Editor: @flor14 Reviewer: @Luming-ubc @mahsasarafrazi Aldo de Almeida Saltao Barros Daniel King

Luming-ubc commented 2 years ago

Data analysis review checklist

Reviewer: @Luming-ubc

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2 hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Points being done well:

Points could be improved:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

mahsasarafrazi commented 2 years ago

Data analysis review checklist

Reviewer: @mahsasarafrazi

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 hour

Review Comments:

  1. nice and tidy repo, all the files are accessible and each part is classified perfectly.

  2. The scripts, EDA, and coding are well designed and fully described

  3. in the README part, instead of having one part as "Forest Fire Area Prediction" you can split it into two different par and shorter paragraphs such as: "About the project", "Background of forest fire", "Predictive question and sub-questions".

  4. As you have environment.YAML file in the "Dependencies" part, it is better not to list all the packages in dependencies, and just tell how to set up the environment, the rest is addressed in the environment file, it can make the summary of your report shorter.

  5. In your analysis, it is better to have a very brief section regarding "How to improve the results" and give your suggestions and findings of the improvement of results, since there might be other models that work better and because of constraints that you had, you could not use them. so if anybody wants to reproduce your result, they can use another model.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

aldojasb commented 2 years ago

Data analysis review checklist

Reviewer: @aldojasb

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 and half hour

Review Comments:

Congrats to the team, the general project is very well tidy and easy to understand. I have just a couple of minor suggestions in this version:

  1. You can use a more powerful introduction to engage the audience about the importance of your work. Some like, "this project has the potential to protect XXX lives once we got a good predictor."
  2. I would use fewer charts in figure 3. It's a bit confusing and you can just select some features to create this chart.
  3. The read file could be a little bit more succinct. But It's just a personal taste I have, I prefer very small and directly to the point README files.
  4. If possible, try to explain the features that you are using in a very easy (and not so academic) way. I think that will be interesting for people with don't have an earth science background.
  5. Have you thought in use other models to compare with SVR? That would be something interesting to make your analysis more reliable and show that your model is really powerful.
danfke commented 2 years ago

Data analysis review checklist

Reviewer: @danfke

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Overall, the project is very interesting and the analysis clearly had a lot of thought put into it and was well executed!

Particularly well:

-The README and final report are very well written. There is a clear and admirable motive towards answering your research question, explanation of the steps that were undertaken in the project and the results obtained. -The use of statistical techniques to remove outliers is impressive. -The charts are beautiful. I just have a slight concern with figure 2, raised below. -The final paragraph of the final report shows well thought-out reflections on potential improvements and current shortcomings.

Could be improved:

-Very minor detail: The last sentence of the beginning section of the README says “has almost zero correlation” when it should be “have almost zero correlation” -The link to the final report in the README does not work. -Latex doesn’t render properly in the final report’s analysis. -The figure captions for all of the figures don’t appear in the Results and Discussion section of the final report. I have the same problem in my report, Tiffany recommends either using pdf or html and creating a GitHub pages. -It is hard to understand what is going on in Figure 2, specifically the fact that there is overlapping box plots in the bottom chart. What do the colors represent? Would this be better as a stacked bar chart? Are the color separations necessary or can each season just be a single color? -It might be too strong of a statement to say that hyperparameter tuning improved the models without mentioning the fact that the training and validation errors all within standard deviation before and after tuning. If the standard deviations are ignored, it appears that the training-validation gap for the MAE optimized model actually increases slightly after tuning.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Anahita97 commented 2 years ago

Thanks everyone for your suggestions, we have tried our best to incorporate your feedback into our project.