Peer Review By Justin Klip

Opening statement summary:

I am reviewing Tommy Fu's Hate Crime Repository and his subsequent paper. He explores hate crime in Toronto across both time and place to provide insights on disparities across these dimensions.

Strong positive points

You have a very clear coding style that is well commented, and the scripts you have right now do well to get, analyze, and clean your data. I also think you lay out what is going to discussed further in the paper well in your intro.

Critical improvements needed

Make sure that you update your LLM.txt file with your chat logs when you finish your project.
Right now your discussion section and data section are not complete. Since I'm doing the same data set as you, one thing I would suggest is you explain how the categorical nature of your data makes it different from other data, since you can't run typical summary statistics on this kind of data.
I see that you don't have any citations right now, make sure you add them to your Bibtex file fil, as I can see you do have sources you are using and cite in your intro but don't have in your references.
Don't forget to cite R! The prof won't grade it if you don't cite R.
Don't forget to cite the other packages you used as well. citation('packagename') should give you the bibtex.

Suggestions for improvement:

Please consider adding/changing/removing:

Perhaps also include a section for the 'measurement' category on the rubric, which just explains how the data goes from some real world phenomena to this data so that it is clear for the grader.
Maybe rename the paper document so it is clear what the paper is about in the filename.
Try and remove any unnecessary files that you won't be using in your actual paper such as the model folder, the datasheet folder, and the profs literature review. The paper also still has some remnants from the starter paper that should be removed.
Perhaps you could expand more in your repo name what project is about 'hate_crimes' gives a general idea, maybe 'toronto_hate_crime_paper_and_code" or something like that would make it more clear.
In the tests file a suggestion I have is to try and explain in your comments why those tests in specific were run rather than other tests, although the ones you have right now are good.
In your abstract you mention "The results highlight distinct trends and reveal neihborhoods with higher concentrations of bias" maybe explain what exactly those results are. E.G North York has a 50% higher religion bias rate, or something like that.
When you do your data section, a suggestion I have is to try to mention why specifically you focus on certain parts of the data set (race, religion, sexual-orientation) over others like age and language. It should be pretty easy since these variables have practically no observations.

Evaluation:

R is appropriately cited (0/1) -> you didn't cite R
LLM Usage is documented (0/1) -> right now you have a part in your readme, but didn't update your text file.
Title (2/2) -> Your title is super clear, explains whats going to happen, and what you will find.
Author (2/2) -> you have your name, have a link to the repo, and the date.
Abstract (3/4) -> You have an abstract that outlines what was done and what was found, but not necessarily why it matters, also you aren't too specific on what your results are.
Introduction (3/4) -> you do a good job explaining broader context, what the paper is about, and what was done, and why its important. It is also of appropriate length. talk about the literature gap you are filling though.
Data ( 0/10) -> currently missing
Measurement ( 0/4) -> currently missing
Prose (5/6) -> I think your wording is quite clear and does not use filler, just a few grammatical issues in your introduction (paragraph 1).
Graphs/Tables/Etc (0/4) -> currently missing
Referencing (0/4) -> currently missing
Commits (2/2) -> you have a bunch of commits that specify what is done. Nice!
Sketches (2/2) -> Super clear graph and table of what you want to display. Can't wait to see the actual graphs.
Simulation (3/4) -> Your simulation is clearly commented and structure, maybe explain why you use specific variables and not others?
Tests (4/4) -> You used a bunch of tests and made sure your data was clean, also well commented.
Reproducibility (4/4) -> You have a detailed readme, and your code is all documented with an R project. Your steps are also all there and you use seeds and relative paths.
Code Style (1/1) -> clear, and styled.
General Excellence (1/3) -> While right now you have a lot missing, I think you are exploring a super cool area and demonstrate a ton of potential in your code and introduction.

Estimated mark:

(32/64) 50 out of 100.

Reason:

Right now you lose most points from just not having stuff, but what is there is very good! I look forward to seeing your paper and I think you will do a great job once you add in those big rubric items.

YichengFu / hate_crimes

Peer Review By Justin Klip #1