CodeForPittsburgh / food-access-map-data

Data for the food access map
MIT License
8 stars 18 forks source link

Dup cleanup, return final dataset #58

Closed hellonewman closed 3 years ago

hellonewman commented 3 years ago

Develop a script that merges duplicate entries into one and returns a final deduped dataset.

maxachis commented 3 years ago

Partially addressed with Pull request #61, which adds merge_duplicates.R and source_field_priorization_sample_data (to later be modified with actual data), and Pull request #62, which adds an intermediate data folder and includes sample input and output files for merge_duplicates.R.

Because the actual input data has not yet been added, merge_duplicates.R currently inputs "merge_duplicates_input_test.csv" and outputs "merge_duplicates_output.csv". I personally don't consider this issue complete until the other dedup scripts are added and they're all strung together, but my component does exist in a mostly-complete fashion.

hellonewman commented 3 years ago

3/16: Update- this is pretty much working but needs to be tested more thoroughly. @maxachis leading the testing process.

maxachis commented 3 years ago

Closing this because we have another issue talking about the testing process.