Closed hellonewman closed 3 years ago
Partially addressed with Pull request #61, which adds merge_duplicates.R and source_field_priorization_sample_data (to later be modified with actual data), and Pull request #62, which adds an intermediate data folder and includes sample input and output files for merge_duplicates.R.
Because the actual input data has not yet been added, merge_duplicates.R currently inputs "merge_duplicates_input_test.csv" and outputs "merge_duplicates_output.csv". I personally don't consider this issue complete until the other dedup scripts are added and they're all strung together, but my component does exist in a mostly-complete fashion.
3/16: Update- this is pretty much working but needs to be tested more thoroughly. @maxachis leading the testing process.
Closing this because we have another issue talking about the testing process.
Develop a script that merges duplicate entries into one and returns a final deduped dataset.