Open voremargot opened 2 years ago
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Points being done well:
Points could be improved:
This was derived from the JOSE review checklist and the ROpenSci review checklist.
nice and tidy repo, all the files are accessible and each part is classified perfectly.
The scripts, EDA, and coding are well designed and fully described
in the README part, instead of having one part as "Forest Fire Area Prediction" you can split it into two different par and shorter paragraphs such as: "About the project", "Background of forest fire", "Predictive question and sub-questions".
As you have environment.YAML file in the "Dependencies" part, it is better not to list all the packages in dependencies, and just tell how to set up the environment, the rest is addressed in the environment file, it can make the summary of your report shorter.
In your analysis, it is better to have a very brief section regarding "How to improve the results" and give your suggestions and findings of the improvement of results, since there might be other models that work better and because of constraints that you had, you could not use them. so if anybody wants to reproduce your result, they can use another model.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Congrats to the team, the general project is very well tidy and easy to understand. I have just a couple of minor suggestions in this version:
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Overall, the project is very interesting and the analysis clearly had a lot of thought put into it and was well executed!
Particularly well:
-The README and final report are very well written. There is a clear and admirable motive towards answering your research question, explanation of the steps that were undertaken in the project and the results obtained. -The use of statistical techniques to remove outliers is impressive. -The charts are beautiful. I just have a slight concern with figure 2, raised below. -The final paragraph of the final report shows well thought-out reflections on potential improvements and current shortcomings.
Could be improved:
-Very minor detail: The last sentence of the beginning section of the README says “has almost zero correlation” when it should be “have almost zero correlation” -The link to the final report in the README does not work. -Latex doesn’t render properly in the final report’s analysis. -The figure captions for all of the figures don’t appear in the Results and Discussion section of the final report. I have the same problem in my report, Tiffany recommends either using pdf or html and creating a GitHub pages. -It is hard to understand what is going on in Figure 2, specifically the fact that there is overlapping box plots in the bottom chart. What do the colors represent? Would this be better as a stacked bar chart? Are the color separations necessary or can each season just be a single color? -It might be too strong of a statement to say that hyperparameter tuning improved the models without mentioning the fact that the training and validation errors all within standard deviation before and after tuning. If the standard deviations are ignored, it appears that the training-validation gap for the MAE optimized model actually increases slightly after tuning.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Thanks everyone for your suggestions, we have tried our best to incorporate your feedback into our project.
The report mentioned using Cook's distance method to identify the outliers. However, it does not briefly explain what is Cook's distance. I think it would be helpful to include a link or reference to explain this concept. We have added a link explaining Cook's distance. This has been addressed here.
Latex doesn’t render properly in the final report’s analysis. This is due to rendering the .Rmd to a .md file. We have addressed this by rendering our final report to .HTML file. This has been addressed here.
The figure captions for all of the figures don’t appear in the Results and Discussion section of the final report. I have the same problem in my report, Tiffany recommends either using pdf or HTML and creating a GitHub pages. In an .md file, figure captions only show when hovering over a plot. This has been solved by rendering to an HTML file. This has been addressed here.
It might be too strong of a statement to say that hyperparameter tuning improved the models without mentioning the fact that the training and validation errors all within standard deviation before and after tuning. If the standard deviations are ignored, it appears that the training-validation gap for the MAE optimized model actually increases slightly after tuning. As mentioned after considering the standard deviations, we noticed that the model does not improve vastly after hyperparameter tuning. Therefore, we made sure to state that the model does not improve much hyperparameter tuning. This has been addressed here.
One of our reviewers has not given us the checkmark for: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness? We have written tests so that the function of the software can be verified. This has been addressed here.
Submitting Authors: @voremargot , Hatef Rahmani, @gauthampughaz @Anahita97
Repo Link: https://github.com/UBC-MDS/forest-fire-area-prediction-group-2 Report Link: https://github.com/UBC-MDS/forest-fire-area-prediction-group-2/blob/dev/reports/Final_report.md
Summary: We have created a simple prediction model to predict the size of forest fires using weather and soil moisture properties. We explore a data set from northeastern Portugal that contains spatial features, temporal features, soil moisture indices, and weather features to predict the size of wildfires within the Montesinho natural park. We create a Support Vector Regression (SVR) model using the soil moisture variables, temperature, relative humidity, wind, spatial coordinates, and season. After removing outliers using Cook’s Distance method, we optimize our model using mean absolute area (MAE) and root mean square error (RMSE). Our optimized model, with C = 1.88 and γ = 0.48, produces a MAE of 8.686 and an RMSE of 28.46 on the unseen test data set, which is good for our area burned values which range from 0 to 1,090 ha.
Editor: @flor14 Reviewer: @Luming-ubc @mahsasarafrazi Aldo de Almeida Saltao Barros Daniel King