bmeg / iceberg-schema-tools

Create and maintain central iceberg schema.
2 stars 0 forks source link

Synthea FHIR bundle split to 'regular' fhir objects #23

Open matthewpeterkort opened 1 year ago

matthewpeterkort commented 1 year ago

Given Synthea fhir bundles like data_model/studies/Alzheimers/0a42d767-95d3-4293-a228-fe971e5bf4e5.bundle.json:

  "resourceType": "Bundle",
  "id": "c4fc036a-866c-47dc-b105-24fa509cf1ac",
  "meta": {
    "lastUpdated": "2022-12-08T14:48:00.718+00:00"
  },
  "type": "searchset",
  "total": 1,
  "link": [ {
    "relation": "self",
    "url": "http://localhost:8090/fhir/Patient?_id=0a42d767-95d3-4293-a228-fe971e5bf4e5&_revinclude=Specimen%3Asubject&_revinclude=DocumentReference%3Asubject&_revinclude=Encounter%3Asubject&_revinclude=Observation%3Asubject&_revinclude=Condition%3Asubject&_revinclude=Task%3Apatient&_revinclude=MedicationRequest%3Asubject"
  } ],
  "entry": [ {
    "fullUrl": "http://localhost:8090/fhir/Patient/0a42d767-95d3-4293-a228-fe971e5bf4e5",
    "resource": {
      "resourceType": "Patient",
      "id": "0a42d767-95d3-4293-a228-fe971e5bf4e5",
      "meta": {
        "versionId": "1",
        "lastUpdated": "2022-12-08T09:08:19.077+00:00",
        "source": "#WfMcBg2NosKANH8F",
        "profile": [ "http://hl7.org/fhir/us/core/StructureDefinition/us-core-patient" ]
      },
      "text": {
        "status": "generated",
        "div": "<div xmlns=\"http://www.w3.org/1999/xhtml\">Generated by <a href=\"https://github.com/synthetichealth/synthea\">Synthea</a>.Version identifier: v2.6.1-174-g66c40fa7\n .   Person seed: -9099037138778007894  Population seed: 1626964256551</div>"
      }... 

Split the bundles into files that contain the individual resource types, ex: Observation, Patient, Task, Specimen in a line by line 'ndjson' format. For example this is 2 lines from an observation.ndjson file which contains all of the Observations from the FHIR bundles:

{"category":[{"coding":[{"code":"laboratory","display":"laboratory","system":"http://terminology.hl7.org/CodeSystem/observation-category"}]}],"code":{"coding":[{"code":"4548-4","display":"Hemoglobin A1c/Hemoglobin.total in Blood","system":"http://loinc.org"}],"text":"Hemoglobin A1c/Hemoglobin.total in Blood"},"effectiveDateTime":"2002-05-07 00:09:58-04:00","encounter":{"reference":"Encounter/9af233cb-ef8f-40c7-bac7-d62e0bb2e61d"},"id":"f8c44e8a-de9f-4b56-9b06-ebed7c32b747","issued":"2002-05-07 00:09:58.536000-04:00","meta":{"lastUpdated":"2022-12-08 09:10:45.678000+00:00","source":"#jz3bNmaAftj29jtc","versionId":"1"},"resourceType":"Observation","status":"final","subject":{"reference":"Patient/fec5a367-a3cc-4eb1-ac72-84b909035e18"},"valueQuantity":{"code":"%","system":"http://unitsofmeasure.org","unit":"%","value":"6.32"}}
{"category":[{"coding":[{"code":"survey","display":"Survey","system":"http://terminology.hl7.org/CodeSystem/observation-category"}]}],"code":{"coding":[{"code":"72166-2","display":"Tobacco smoking status NHIS","system":"http://loinc.org"}],"text":"Tobacco smoking status NHIS"},"effectiveDateTime":"2002-05-07 00:09:58-04:00","encounter":{"reference":"Encounter/9af233cb-ef8f-40c7-bac7-d62e0bb2e61d"},"id":"53460040-5e00-4334-a751-d1b4c69310aa","issued":"2002-05-07 00:09:58.536000-04:00","meta":{"lastUpdated":"2022-12-08 09:10:45.678000+00:00","profile":["http://hl7.org/fhir/us/core/StructureDefinition/us-core-smokingstatus"],"source":"#jz3bNmaAftj29jtc","versionId":"1"},"resourceType":"Observation","status":"final","subject":{"reference":"Patient/fec5a367-a3cc-4eb1-ac72-84b909035e18"},"valueCodeableConcept":{"coding":[{"code":"8517006","display":"Former smoker","system":"http://snomed.info/sct"}],"text":"Former smoker"}}