Closed decause-gov closed 5 months ago
https://github.com/DSACMS/ecqm-dedupe/pull/6 Is a PR that adds parsing of real FHIR formatted JSON. It was verified to work with Synthea data. However, Synthea data doesn't include duplicates so we will have to create those ourselves. We need a good dataset that is in need of deduplication.
We decided to use Synthea and Faker seperatly instead
The text we're matching/deduplicating in the project currently is made-up.
We want to use actual FHIR records, and there's a library for synthetically generating such records called Sythea: https://github.com/synthetichealth/synthea
In addition to FHIR records, we would like this same tool to be able to 'speak' other dialiects, since it is a general purpose text matching library. So QRDA files would also be a great datatype to support. QRDA is xml based, pretty sure, but same principles apply...