Closed whanley closed 2 years ago
could you make a new pull request, but uploading the cleaned leon-events file with a different name? we should keep the raw file, but have a "clean" one to work off of
I could upload it under a different name. However the existing leon-events file is already a derivative of the original files (which we received on CDROM). I found problems in that file--extra columns and the like, caused by csv conversion errors--and I think this is a better baseline file.
Other than error correction, I
I did this in a very conservative way, i.e. I made no judgment other than obvious clustering, using the OpenRefine clustering function and reviewing each decision individually.
Okay, fair! We can use this as the baseline file. I'll fix the merge conflicts and then squash and merge this PR
I did a lot of work with OpenRefine to standardize entries in some string fields and find other problems in csv conversion. This is a lot cleaner than previous versions.
I think that the merge conflict is due to some extra error columns being removed?