Open danfke opened 2 years ago
1.5H
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Particularly Well:
I thought the data set selected was interesting and more ambitious. Many groups stuck to the UCI, so branching out and finding something different is a real plus. I also liked that they used a technique we haven't covered in class (Undersampling).
Their report was clear and easy to read and understand.
Their code is also quite well written, neat and easy to follow.
Improvements:
The inclusion of an environment file is a nice touch, but using the --from-history flag when generating this would allow for people to use it in different OS environments. As far as i can tell this is from a windows environment. directions in Usage on how to install would be helpful as well.
It appears the source data is no longer available (at least temporarily). Adding it to the repo or providing a source and making note of that would be helpful.
Not having a data directory skeleton means the scripts failed, so including them (even empty) would be a good idea. I see that the script should try to make them but it didnt work for me until i created the path. maybe because it is 2 levels?
The environment is missing something for the EDA in windows. I know this because our project is too! in order to output PNG in windows from altair you need to run the following, and it doesnt seem to get captured properly when you export a environment.yaml file: npm install -g vega vega-cli vega-lite canvas (see lecture 2 from 531 for details - scroll down to the warning): https://pages.github.ubc.ca/mds-2021-22/DSCI_531_viz-1_students/lectures/2-data-types_graphical-marks_visual-encondings.html#global-development-data
I had this error and can confirm it fixed it for your project
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Positives:
RFECV
to reduce the total number of features and addressing the class imbalance using undersampling are commendable.Improvements:
This was derived from the JOSE review checklist and the ROpenSci review checklist.
if __name__ == "__main__":
. It might be a bit more readable and consistent with a main function, since all of your other scripts have a main function.This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
assert
to check whether the file actually exists would prevent someone to accidentally remove code that generates output files.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Thank you everyone for the great feedback! We've addressed many of the points provided, including the following:
In regards to comment number 2 of @gfairbro's review (an issue which was also mentioned in subsequent reviews)we addressed this by linking to another URL containing the same data as per Tiffany’s directions: 503c540
.
In regards to comment number 4 of @gfairbro's review we added a line in the usage section of our README where it explains that windows users may have to run an additional command prior to running the make file in order to properly render PNG's: 630613e
We also added it to our Dockerfile so that it runs properly: 4d287a5
In regards to comment number 3 of @gauthampughaz's review we added assert statements to all of the scripts that did not already have some form of testing: c1731a3
677fad9
45414ce
3465f96
In regards to comment number 3 of @jo4356's review we addressed this here: 75d1262
Submitting authors: @iamMoid @SiqiTao @gn385x @danfke
Repository: https://github.com/UBC-MDS/Collision_Prediction Report link: https://github.com/UBC-MDS/Collision_Prediction/blob/main/doc/collision_prediction_report.md Abstract/executive summary:
In this project we attempt to build a classification model using the logistic regression algorithm and data obtained from police-reported motor vehicle collisions on public roads in Canada to predict whether a motor vehicle collision would result in a fatality or not. The final model performed poorly on both the training set and the test set, returning a high recall of 0.698, but a very low precision of 0.048, resulting in a low f1-score of 0.09. The impact of the low precision can be seen in the results of the prediction of the test set, where the model incorrectly predicts fatalities around 20 times more than it correctly predicts fatalities.
The data set that was used in this project came from the National Collision Database, published by Transport Canada. The National Collision Database contains data on all of the police-reported motor vehicle collisions on public roads in Canada from 1999 to the most recent available data from 2017. We ran our analysis using the data collected from collisions that occurred in 2017. This data set contains information licensed under the Open Government Licence – Canada.
Editor: @flor14 Reviewer: PUGHAZHENDHI_GAUTHAM, Ahn_Kyle, Fairbrother_Gabriel, Wang_Joyce