DSCI-310-2024 / data-analysis-review-2024

2 stars 0 forks source link

Submission: Group 1: Predicting Fatalities from Tornado Data #1

Open ttimbers opened 6 months ago

ttimbers commented 6 months ago

Submitting authors: Erika Delorme, Marcela Flaherty, Riddha Tuladhar, Edwin Yeung

Repository: https://github.com/DSCI-310-2024/DSCI-310-Group-1-Predict-Fatalities-From-Tornado-Data/tree/0.0.3

Abstract/executive summary:

In our project, we attempt to build a multiple linear regression model that will predict the number of fatalities from tornadoes using the features width (yards) and length (miles) of the tornado. We tested our multilinear regression model with and without outliers and compared differences in coefficients and RMSPE scores. Both models had low positive coefficients, suggesting a minimal yet positive impact on the prediction of tornado fatalities, and both had low RMSPE scores, suggesting a low amount of error in its predictions. The model without outliers had a lower RMSPE score, which is partly explained by the lack of outliers and thus making predictions on a smaller range, which reduces the error. Despite the limitations of our model, we believe that it can still have some utility in predicting tornado fatalities with little error. However, the model should be improved in the future before being deployed to improve the size of the coefficients and its predictive power. In the future, we may consider exploring other features in predicting fatalities, predicting the number of injuries from the same features, or even predicting the number of casualties (injuries and fatalities) from the same and additional features.

The data set that was used in this project is from the US NOAA's National Weather Service Storm Prediction Center Severe Weather Maps, Graphics, and Data Page. It was tidied and sourced from TidyTuesday and can be found here. Each row represents a tornado, along with various features, including width, length, date, time, state in the US, magnitude, financial losses, number of fatalities, number of injuries, etc.

Editor: @ttimbers

Reviewer: Andrea Jackman James He Neha Menon

jamesh14 commented 6 months ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 3

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Just my curiosity: I wonder if SLR is the most suitable method for predicting tornado fatalities compared to other prediction models e.g. decision trees, e.g. SVM RBF. Factors such as population density and infrastructure across states can also influence the number of fatalities, which could potentially explain the outliers when predicting with length and width.

This is a great repository, and I thoroughly enjoyed reading your analysis. It's evident that a lot of effort and time has been put into structuring the repository and setting up instructions to make it easier for users to follow. Well done!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

ajackman2 commented 5 months ago

Data analysis review checklist

Reviewer: ajackman2

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Overall, the report is well laid out and explained. The repository is very well-organized and labeled, good job! You obviously worked very hard on the report and made a great final product. Below I have made some comments about things I think you can improve.

In the abstract I think it would be beneficial to explain what RMSPE is in a little more depth so that anyone unfamiliar with it can understand what your RMSPE values mean. This will help people who read the report understand what your models are doing better.

It would be nice to link to the license in the README.md file, so users can easily navigate to the license if they wish to view it.

I noticed that in the 'tests' folder there is a file 'vdiffr.Rout.fail' I'm not sure what this file is and what it's purpose is.

I don't see a pdf version of the final report rendered in the docs folder. The steps for creating a pdf version seem to be missing from your MakeFile and qmd.

I also tried to follow the link in your report to the 'tornado_fatalities_predictory.ipynb' and it is no longer a valid link as you have split the code into multiple scripts. Consider changing to this link the 'src' folder.

When I run your tests, one of them fails "Failure (test-accuracy_plot.R:6:3): refactoring our code should not change our plot Snapshot of testcase to 'accuracy_plot/accuracy-plot.svg' has changed"

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

nehamenon704 commented 5 months ago

Data analysis review checklist

Reviewer: nehamenon704

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Overall, I found the topic very interesting and unique, and the analysis was conducted very well! I was able to follow along with all steps and navigate through the repository easy. In addition, putting author and date information on function files was very helpful, and provides an accessible record of when each file was written, arguably easier than parsing through the commit history.

Here are my items of feedback on parts that could be improved:

Again, a really good effort and coherent analysis, good job!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.