Open ttimbers opened 6 months ago
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
README.md
is original and easy to read, but there is one issue: the missing link to the final report. I think this is due to the fact that the analysis.ipynb
is deleted from the repository.
analysis.ipynb
file is missing, it created some trouble for me when reading through the group’s research topic and question. I was only able to find them after cloning the repository and opening up the tornado_fatalities_predictory.html
to access the report that way. docker-compose.yml
is very detailed, but no instructions for using the environment.yml
. However, the environment.yml
file still exists in the root of the repository. If someone wants to create an environment from this project, they would run into trouble if they don’t know how to create an environment with renv
. 01_download_data.r
, 02_clean_preprocess_data.R
, 03_eda.R
. While 01_download_data.r
ends with lowercase rather than uppercase like the other scripts. tornado_fatalities_predictor.html
but no tornado_fatalities_predictor.pdf
.Just my curiosity: I wonder if SLR is the most suitable method for predicting tornado fatalities compared to other prediction models e.g. decision trees, e.g. SVM RBF. Factors such as population density and infrastructure across states can also influence the number of fatalities, which could potentially explain the outliers when predicting with length and width.
This is a great repository, and I thoroughly enjoyed reading your analysis. It's evident that a lot of effort and time has been put into structuring the repository and setting up instructions to make it easier for users to follow. Well done!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Overall, the report is well laid out and explained. The repository is very well-organized and labeled, good job! You obviously worked very hard on the report and made a great final product. Below I have made some comments about things I think you can improve.
In the abstract I think it would be beneficial to explain what RMSPE is in a little more depth so that anyone unfamiliar with it can understand what your RMSPE values mean. This will help people who read the report understand what your models are doing better.
It would be nice to link to the license in the README.md file, so users can easily navigate to the license if they wish to view it.
I noticed that in the 'tests' folder there is a file 'vdiffr.Rout.fail' I'm not sure what this file is and what it's purpose is.
I don't see a pdf version of the final report rendered in the docs folder. The steps for creating a pdf version seem to be missing from your MakeFile and qmd.
I also tried to follow the link in your report to the 'tornado_fatalities_predictory.ipynb' and it is no longer a valid link as you have split the code into multiple scripts. Consider changing to this link the 'src' folder.
When I run your tests, one of them fails "Failure (test-accuracy_plot.R:6:3): refactoring our code should not change our plot Snapshot of testcase
to 'accuracy_plot/accuracy-plot.svg' has changed"
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Overall, I found the topic very interesting and unique, and the analysis was conducted very well! I was able to follow along with all steps and navigate through the repository easy. In addition, putting author and date information on function files was very helpful, and provides an accessible record of when each file was written, arguably easier than parsing through the commit history.
Here are my items of feedback on parts that could be improved:
Location of final report: o After running the analysis, I expected the report to be in the results directory, but found it in the docs directory. I would recommend either merging these two directories or renaming the docs directory to something like reports. This would help to enhance repository organization and avoid confusion with anyone running your analysis.
Tag the version of quarto: o While going through the repository and the Dockerfile, I noticed that the version of quarto was not pinned. To ensure that the container can be built/run on other people’s systems, I would recommend tagging this version.
Usage instructions: o These instructions are very well-written, and I like the level of detail. However, to ensure that all the information is provided, I would recommend providing the code needed for someone to clone the repository (git clone …). All the other instructions have the required code below, and to help with consistency, adding it for this one would help.
Again, a really good effort and coherent analysis, good job!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: Erika Delorme, Marcela Flaherty, Riddha Tuladhar, Edwin Yeung
Repository: https://github.com/DSCI-310-2024/DSCI-310-Group-1-Predict-Fatalities-From-Tornado-Data/tree/0.0.3
Abstract/executive summary:
In our project, we attempt to build a multiple linear regression model that will predict the number of fatalities from tornadoes using the features width (yards) and length (miles) of the tornado. We tested our multilinear regression model with and without outliers and compared differences in coefficients and RMSPE scores. Both models had low positive coefficients, suggesting a minimal yet positive impact on the prediction of tornado fatalities, and both had low RMSPE scores, suggesting a low amount of error in its predictions. The model without outliers had a lower RMSPE score, which is partly explained by the lack of outliers and thus making predictions on a smaller range, which reduces the error. Despite the limitations of our model, we believe that it can still have some utility in predicting tornado fatalities with little error. However, the model should be improved in the future before being deployed to improve the size of the coefficients and its predictive power. In the future, we may consider exploring other features in predicting fatalities, predicting the number of injuries from the same features, or even predicting the number of casualties (injuries and fatalities) from the same and additional features.
The data set that was used in this project is from the US NOAA's National Weather Service Storm Prediction Center Severe Weather Maps, Graphics, and Data Page. It was tidied and sourced from TidyTuesday and can be found here. Each row represents a tornado, along with various features, including width, length, date, time, state in the US, magnitude, financial losses, number of fatalities, number of injuries, etc.
Editor: @ttimbers
Reviewer: Andrea Jackman James He Neha Menon