DSACMS / dedupliFHIR

Prototype for basic deduplication and aggregation of eCQM data
Creative Commons Zero v1.0 Universal
8 stars 0 forks source link

Insufficient Blocking Rules on Some Data #43

Closed IsaacMilarky closed 2 months ago

IsaacMilarky commented 2 months ago

Describe the bug Duplicates are not found when running on certain test data supplied by Octavian Chiorcea [coctavius@mdinteractive.com](mailto:coctavius@mdinteractive.com)

To Reproduce Steps to reproduce the behavior:

Expected behavior The script should output an excel file with all of the duplicates identified.

Actual behavior In the results xlsx file , i see it detects duplicates only the ones that have wrong names (the ones that are the correct names, it seems to have a different cluster id) - perhaps this happens because it doesn't use birth_date to detect dupes. I tried to change deduplifhirLib/settings.py , but it didn't seem it had any effect changing config there.

IsaacMilarky commented 2 months ago

This seems to be addressed but holding off on closing until correspondence with coctavius@mdinteractive.com