Open ttimbers opened 2 years ago
In short, good work! The entire project suits the framework as stated in the outline, and following the instructions in README page allows me to successfully reproduce the analysis. The report is well-structured and supported by sufficient empirical evidence and references.
I have couple of thoughts and suggestions while reading the report that might worth considering.
notebook
directory, duplicating a copy of the html file at the results
could be more friendly for someone else to find what they need. How I achieved it was adding a line of cp [path_of_html] results/[report_name]
under Makefile
, I hope it could somewhat help.ISI
and RH
with a correlation of -0.150, ISI
and DC
of 0.216 or even the density of variables could be redundant given the purpose of Figure 2. In this case, a correlation matrix may not be a best solution. Instead, I would recommend drawing a scatterplot for each of the 5 pairs of “strongly correlated” variables. And if possible, adding a regression line can make the plot fancier (but not necessary).This was derived from the [JOSE review checklist](https://openjournals.readthedocs.io/en/jose/review_checklist.html) and the ROpenSci review checklist.
Great job! The repository was clear to explore, the instructions on the initial page easily allowed me to reproduce the analysis using 'Make' and to explore the code using the docker environment. Some personal changes I might suggest would be:
I usually like to have my .R scripts be callable from other code, so it would maybe help in the future if each script could have a main function that executes the actual function [ function main( ) calls function data_splitting( ) ], so it can be executed from the command line, but at the same time also be imported as a file into other code that might use that second new function [ another script being able to call data_splitting(url, out_dir) ].
In the report analysis I feel there was a sudden change from EDA to Model Evaluation. Reading through it you might find going from exploring the variables to finding the best k too confusing, as which and the why of the model is not introduced. You discuss it further in the Discussion section, but it might be worth mentioning in between as I feel it helps follow the analysis "narration" better.
Some of the references in the analysis report were hard to tie with the actual analysis. Maybe mentioning them during the analysis and where they were used might make it easier to relate them to the project when you read through them after finishing the report.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
When trying to reproduce the project, I ran into an issue that states "the input device is not a TTY. If you are using mintty, try prefixing the command with 'winpty'" when trying the run the docker run command. It ran when I added "winpty" in-front of the "docker run -it --rm -p 8888:8888 -v /$(pwd):/opt/notebooks a0kay/dsci-310-group-11 make -C /opt/notebooks" command but it would be useful to add a statement in the readme file that explains that the user should add the word winpty in-front of the command if they run into the same issue.
The test scripts do not have any documentation. All the scripts that contain functions do but I think adding documentation to the test scripts would help understand what the tests do and each test purpose.
The overall writing was very interesting and overall enjoyed looking at the analysis. I think the table in the methods and results section of your analysis was the only thing that was a little hard to read. The numbers and column names are very close to each other making it hard to read
Overall I thought the project was very well done. I only ran into the small mistake when reproducing it which was a very easy fix. I thought the research question was very interesting and there was a very good background provided on the research topic. Very well done on the project and great job by the team
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Overall, this data analysis is really impressive to me. The README file is clear and the summary has concluded what you have done for the analysis and I like it very much. Here are several suggestions after running the file:
I tried to run the file by following the instructions in README through Docker but from my end, it seems that it doesn't work. Maybe you can have double-check of the instruction for the reviewers to run the file through the docker.
For the Dataset Information, I think you don't need to list all of the variables. Instead, you can point out which variables you think are useful and will be used in the analysis. Because when you list all the variables, it is not easy for the reader to read them all and remember them.
For the dataset folder, all the datasets can be viewed directly and easily. Each data file looks clean and has a clear name to distinguish others. The datasets are easy to be archived and accessible. It is a really good point for viewers to read when they have any questions about the analysis.
For the data analysis part, it is really nice to have a ggplot graph. But I suggest that the collaborators can indicate the reason why they are going to choose those variables from the ggplot graph first and then wrangle the data after. The viewers and readers can have a comprehensive understanding then.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @samzzzzzh @a-kong @Jaskaran1116
Repository: https://github.com/DSCI-310/DSCI-310-Group-11
Abstract/executive summary: A wildfire is an uncontrolled fire that starts in the wildland vegetation and spreads quickly through the landscape. A natural occurrence, such as a lightning strike, or a human-made spark can easily initiate a wildfire and wipe away millions of properties. However, the extent to which a wildfire spreads is frequently determined by weather conditions. Wind, heat, and a lack of rain may dry out trees, bushes, fallen leaves, and limbs, making them excellent fuel for a fire. In this project, we wish to predict the burned area of forests based on several environmental factors with a k-NN regression model. By establishing a transparent link between them, it is possible to identify potential risk factors and take appropriate safeguards to prevent the emergence of forest fires and the disasters they generate.
Editor: @ttimbers
Reviewer: @gzzen @alexkhadr @mcloses @snowwang99