Closed trevorcampbell closed 3 years ago
I completely understand the confusion. This is due to me being very sloppy with filenames for the later parts I have coded. I was planning to clean up this code and clarify all this, hopefully it will make sense when I am done. I think #4 is also very related.
See https://github.com/chicago-police-violence/data/issues/4#issuecomment-898786989 . I think once we clarify the filenames, the linked
folder can serve at the final
folder and I think it could even be renamed to final
. It is not clear to me that we need to introduce another folder for this, since the linking is currently our final step.
After working on this repo a bit, I understand now why you have a separate roster.csv and profiles.csv file (an officer shows up in different records with different field values, e.g. if their name or rank changes over time). I wonder if it makes sense just to output the profiles.csv file... we can chat about this in-person and then record the outcome of the discussion here for future reference.
I think we finally have a reasonable looking "final" folder. I think we can output both roster.csv
and profiles.csv
as long as the documentation is very clear about what is contained in profiles.csv
(and emphasizing that in most cases, roster.csv
is all you need). What do you think? Feel free to close this issue, unless you believe there is more to be done about it.
Closing this for now. The final
folder looks fine to me now, but we can reopen an issue for the more specific question of roster.csv
vs profiles.csv
.
It is a bit odd that we have both
roster.csv
andprofiles.csv
at the output. Although I'm not sure exactly what the difference is, it seems the twoP0-46957_*.csv
files are also overlapping.I think we should be more "opinionated" about what we consider to be "final results" of the processing. Maybe we need to introduce a third folder to store the final results?