DSCI-310-2024 / DSCI-310_group-10_crime-prediction

Part of the group project assignment for DSCI 310: Reproducible and trustworthy workflows for data science.
Other
1 stars 1 forks source link

Feedback addressed #101

Open jamesh14 opened 7 months ago

jamesh14 commented 7 months ago

Revise your data analysis project to address feedback received from the DSCI 310 teaching team from past milestones, as well as feedback received from the peer review. 50% of your final grade for this milestone will be assessing whether you have addressed this feedback to improve your project.

To help us easily, and correctly, assess this, please create a GitHub issue in your analysis project’s GitHub repository with the title “Feedback addressed”. In this issue describe any improvements you made to the project based on feedback, and point to evidence of these improvements. You can point us to evidence of addressing it by providing URLs to reference specific lines of code, commit messages, pull requests, etc. Be sure to add some narration when sharing these URLs so that it is easy for us to identify which changes to your work addressed which pieces of feedback.

You will be graded on a sliding scale for this, the more improvements you make, the higher your grade for this part of the milestone will be. The improvements should be at least one per team member, and they should significantly improve the project. This minimum could earn at most 37.5/50. To earn more, you need to exceed these minimum improvements.

jamesh14 commented 7 months ago

Milestone 1 & 2 Feedback:

Issue Feedback Issue Resolution Commit
1 Repository name is not relevant to the topic of the project Renamed repository from DSCI310-group10-project to DSCI-310_group-10_crime-prediction.
2 All code related folders, files should be lowercase During milestone 2, the 'Visualizations' directory was renamed to 'results'. However, it's worth noting that not all files within this directory conform to lowercase naming conventions. Specifically, files like Dockerfile and Makefile have uppercase starting letters, deviating from the norm.
3 Missing link for milestone release, format instructions on how to run analysis Added a section in README.md specifying the local file path to the report. In this case, the local file path to the report is linked to where quarto_reports.qmd is located. 6d875289314f39687bad3e4774ff4620e46a0e94
4 No Creative Common license(for project report) was specified Added the Creative Commons license to the README.md file and included it in the LICENSE.md, alongside the MIT license. 4f1853482b717ce474b2994513066534fce0afec , dc95a374f4047cb546231ee999ccfd0e80f77c81
5 Abstract/Summary, the summary is too short and not informative Expanded and enhanced the abstract/summary to offer more details on the project's analysis model, specifically logistic regression. Additionally, incorporated findings and results into the summary. Furthermore, included source links and credited the dataset's publisher as a means of attribution. 7271255636353891b845d94076f1bfd5e243a08b
6 Missing labels for tables and the last chart was not rendered Figure labels have been assigned in the .qmd file. All visualizations can now be rendered in the .ipynb analysis file. Previously, some plots were unable to render due to missing directory paths. Importing the plots through sys now allows rendering on GitHub. 2346b9b5cef2256f1ab95c1e838d9f48aa986aba
7 Missing reference to dataset source Added two references at the bottom of the project documentation indicating the source of the dataset and descriptions of the dataset variables. https://github.com/DSCI-310-2024/DSCI-310_group-10_crime-prediction?tab=readme-ov-file#references
8 Inline documentation is present, which is good. However, the script documentation is missing (e.g., what the script does and how to use it). Added documentation, to usage, options, and what the script does to each of the five python scripts. 428c315add08855f3c14641309cf700fd2963da1
9 Usage documentation could be improved for clarity (i.e., it is not explicitly clear to the user how to use the project, or some of the wording is confusing, some guessing and/or trial and error had to be performed to run the project). We opted to eliminate the use of environment.yml due to issues it caused with running certain analysis components. Additionally, with the computational environment now upgraded to a Docker container, we will exclusively utilize Docker for running all analyses moving forward. 3e1a0d2cc328c02a58d25dfff8184342a5ca0d76
10 Could not reproducibly run the analysis because several packages were missing from the provided computational environment. After removing the usage instructions for environment.yml, we have revised the package dependencies in the README to only include Docker dependencies. This addresses the confusion regarding missing package dependencies, as environment.yml only listed 6 while the Dockerfile has 9. The discrepancy arose because the environment.yml file was only intended for running Milestone 1 instructions and not for Milestone 2 onwards. 226b274b70ef7935409a0bc313cd94ef467d3aea
11 Usage instructions contain typos/errors Removed the brackets between code line <git@github.com:DSCI-310-2024/DSCI-310-group-10_crime-prediction.git>. This created errors when users pasted git clone <code link> in the terminal. 2630cdd9130769fe5ea071f114e721ec8727e356
pragszz commented 7 months ago

Peer Review Feedback:

Issue Feedback Issue Resolution Commit
1 Error in MakeFile: Value Error performing "python src/analysis.py" Added the viz_df result to analysis function which was being returned in perform_analysis. Previously, the function was returning 4 values, however, only 3 were being called which caused the Value Error. d883e098af6e4a409d180a3581b9b3652632966d, d883e098af6e4a409d180a3581b9b3652632966d
2 README file does not suggest to "make clean" prior to make all. Added the 'make clean' command to README.md to remove any previously created reports and files and for anyone to make new files without any conflict. 71ff1d252c7265e80a012338a37897371e9484f8
3 The README file contains a typo in the git commands regarding cloning the repository to local machines. The URLs of the .git repositories should not be enclosed in angle brackets (<>) as indicated in the commands. Removed angle brackets from cloning instructions. 54af59a129f6fbab0b601e3abbb084c929b6b3be
4 No Authors mentioned in Quarto document Authors added to the quarto document bb05e0b8762656cc7f8984f959b2b506fd60defe
5 Check the function get_time_period for invalid hour or minute input like '-1' hour or '61' in minutes and throw type error as they are both integers but outside the range of time. Added tests to test_time_period.py to test for out of boundary and invalid hour and minute input type Commit: pycrimeprediction
6 Report does not contain limitations or assumptions of the result Added limitations and assumptions of the result Commit: 209a853e1bd53a6a42de422dc50d975a909073b9
7 No justification of using logistic regression as the model Included justification for using logistic regression in the analysis Commit: 209a853e1bd53a6a42de422dc50d975a909073b9
8 No DOI in references References remain unchanged as only one of our sources is an article (for which we included a DOI reference). Other sources used are obtained from a news website and a magazine. The citation for the book "Predictive Policing: The Role of Crime Forecasting in Law Enforcement Operations", was obtained from the jstor.
9 Overall improving the analysis of the model with the following: adding additional model evaluation metrics such as accuracy, precision, recall, and F1 score, address biasness in the model with a confusion matrix, variable selection and test how different models perform While these suggestions would significantly improve our analysis, implementing them would require writing more complex code which would not be feasible given the time constraints. However, it would be beneficial to incorporate them into our analysis in the future to enhance the metrics mentioned in the review.