Open senjed opened 4 years ago
@senjed sorry, missed this notification at the time, but part of the reason that I didn't reply at the time is because there isn't a simple response.
I believe your issue is related to the changes to the food inspection data on 7/1/2018. The structure of the food inspection reports changed dramatically, and this model is no longer valid for the new format.
There are other format changes with the business license data as well that I believe would cause problems.
This is the simple part of the response, but the complicated part is telling you what's next. I'll ignore that for now.
I am trying to build the same matrix such as 21_food_inspection_violation_matrix_nums.csv for the more recent inspections by parsing the Violations column. My assumption is that if violation v_i is mentioned in the Violations column of an inspection then it is violated(might not be true but how can I write heuristics for it?). I am observing two issues
1- the number of violations for the older inspections is 45. However, I find around 110 using more recent data. Also, the violation code is not unique. For example code 30 can mean different violations of different inspections.
2- Using the same method by parsing the violation filed I tried to replicate the file 21_food_inspection_violation_matrix_nums.csv however my final matrix was different for some violations.
I was wondering if you could share your script for building the violation matrix from the violation columns and also let me know why there is inconsistency between violation codes for more recent violations.