DSACMS / dedupliFHIR

Prototype for basic deduplication and aggregation of eCQM data
Creative Commons Zero v1.0 Universal
8 stars 1 forks source link

Swap actual FHIR and QRDA records for the example.json that is currently used #5

Closed decause-gov closed 5 months ago

decause-gov commented 11 months ago

The text we're matching/deduplicating in the project currently is made-up.

We want to use actual FHIR records, and there's a library for synthetically generating such records called Sythea: https://github.com/synthetichealth/synthea

In addition to FHIR records, we would like this same tool to be able to 'speak' other dialiects, since it is a general purpose text matching library. So QRDA files would also be a great datatype to support. QRDA is xml based, pretty sure, but same principles apply...

decause-gov commented 11 months ago

ecqm-dedupe-hacksesh-062823.txt

IsaacMilarky commented 10 months ago

https://github.com/DSACMS/ecqm-dedupe/pull/6 Is a PR that adds parsing of real FHIR formatted JSON. It was verified to work with Synthea data. However, Synthea data doesn't include duplicates so we will have to create those ourselves. We need a good dataset that is in need of deduplication.

IsaacMilarky commented 5 months ago

We decided to use Synthea and Faker seperatly instead