DSACMS / dedupliFHIR

Prototype for basic deduplication and aggregation of eCQM data
Creative Commons Zero v1.0 Universal
8 stars 1 forks source link

Add Command to Generate Diff Files and Print Their Paths #34

Closed IsaacMilarky closed 6 months ago

IsaacMilarky commented 7 months ago

Add Command to Generate Diff Files and Print Their Paths

Problem

As part of our Alpha process, we have planned to add a CLI command to generate diff comparisons between the detected duplicate patient data files.

Solution

Create the CLI command gen-diff to generate diff files for the previously computed duplicates.

Result

A new command has been added called gen-diff that generates diff files in the cache directory as well as printing out any files that point to the same patient on the same line in a text output to stdout. For example:

file1 file2
file5 file8 file9 file12

This would show that file1 and file2 point to the same person and need to have someone look over them and merge them. Same with file5 file9 and file12 all being duplicate files for the same person. This way the standard out can be easily be used to highlight work for the user to do in the front end.

Diff files are only generated for duplicates that have only two duplicate records currently.

Important Notes For Reviewers

It is important to note that diffs can only be generated to compare two files so if many files are deemed to be duplicates some other method needs to be used to compare them such as diffuse. This is something that we should talk about next time we have the chance to talk front end strategy since there are many routes we could go.