DSCI-310-2024 / data-analysis-review-2024

2 stars 0 forks source link

Submission: Group 10: Predict and Classify the appearance of criminal incidents based on historical incident reports #10

Open ttimbers opened 3 months ago

ttimbers commented 3 months ago

Submitting authors: Cassandra Zhang, Ethan Kenny, James He, Pragya Singhal

Repository: https://github.com/DSCI-310-2024/DSCI310-group10-project/releases/tag/v3.0.0

Abstract/executive summary:

Law enforcement agencies worldwide prioritize crime prevention and public safety, traditionally relying on experience and intuition for resource allocation. However, advancements in data analysis now enable a more data-driven approach. This analysis aims to predict the appearance of criminal incidents from time period, day of the week, and police district based on data from San Francisco 2023. Understanding time-related crime patterns can inform proactive policing strategies. By associating time periods, police districts, and days of the week with the appearance of criminal incidents, this study aims to provide a forecasting tool for police patrol scheduling and resource allocation, ultimately enhancing law enforcement activities and public safety.

Editor: @ttimbers

Reviewer: Sri Chaitanya Bonthula, Kevin Yu, Shahrukh Islam Prithibi, Viet Ngo

KevinatorYu commented 2 months ago

Data analysis review checklist

Reviewer: KevinatorYu

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1

Review Comments:

The report is good. The research question is very intriguing, and is a perfect example of statistics being applied in policing, whether it is controversial or not. The data is also extremely current, meaning that the analysis results are as up-to-date as possible. The source of the data is also extremely reliable, coming directly from the government of San Francisco. For the analysis, the code coverage of the tests look good, the code looks very easy to follow, and utilizes lots of good practices. (Perhaps the people who coded are currently/recently took CPSC 330? đŸ˜„)

However, there are a number of important issues that should be addressed.

There are several errors in the README file. I followed the steps to create a Docker Container. In those steps, you do not require the <> surrounding the URL. Also, your README does not recommend the user to run "make clean" prior to the make all, to remove all the files such that new files can be produced through the Makefile.

The Makefile pipeline does not work. There appeared to be an error in src/analysis.py, a "too many values to unpack" error. This error prevents the analysis to be reproduced.

In the finalized report, there are no authors specified. The report also lack some detail regarding the use of Logistic Regression (as in, why did your team decide to use Logistic Regression, versus something else like KNN classifiers?), and lacks any communication regarding any assumptions or limitations of their results. There also appears to lack some DOIs (ex. the very first reference). The styling of the report starts to feel "funky" near the end, especially the Future Questions section. The Future Questions section appears to be more of a list rather than a concrete paragraph. The impact section could be expanded upon too.

In summary:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

ShahrukhP15 commented 2 months ago

Data analysis review checklist

Reviewer: ShahrukhP15

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2

Review Comments:

The report is easy to follow and the data is sourced from a reliable source, however it is a good practice to include the license of the data used to adhere to the rules. The quarto document is easy to follow and has valuable EDAs to get insight of the data. Also the author's mentioned the process of selection of coefficients with both a table and figure which gives valuable information into why the coefficients were selected. A possible improvement could be mentioning the limitations, suggesting advanced modelling techniques to research performance or addressing the issue with false positives.

However:

In short this is a really exciting project and a really intriguing research question. There are a few coding errors which might have been missed. Also I hope you put some emphasis on mentioning the biasness and false positive issue as they are serious potential problem which can cause a waste of resource and money. All in all, it was a really exciting report to read.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

actually-arri commented 2 months ago

Data analysis review checklist

Reviewer: actually-arri

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

The project overall seems pretty well put together. It follows a lot of the principles and guidelines we were taught in class. I do find your research question quite interesting. I can see potential areas where you can further strengthen your project. Such as being able to account for data bias (Potential historical law enforcement bias), changes of crime with season and ethical issues with regards to privacy and possibly perpetuating discrimination. I do understand this is very early stages and as such a strong base for the project. Another potential area for improvement would be better commit messages. Some of them were very vague and might cause confusion down the line.

Below are the minor issues I came across:

In summary, very well done project with a motivating research question. Apart from minor nitpicks I think you have applied what you have learned well and will go on to lead some amazing projects and teams. Good luck for the final :)

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.