databrickslabs / dbignite

Other
22 stars 11 forks source link

FHIR to OMOP Notebook Doesn't Work #30

Closed Zachary-Higgins closed 8 months ago

Zachary-Higgins commented 9 months ago

Hi, I'm working on a POC to convert our FHIR bundles to OMOP as outlined in https://github.com/databrickslabs/dbignite/blob/main/notebooks/dbignite-demo.py

Simplifying the problem, I'm unable to import data_model due to how the project is structured...

%pip install git+https://github.com/databrickslabs/dbignite.git
from dbignite.data_model import *

Anyway, there is more to the notebook I would like to test, but none of the proper files come down when I install it.

image

Long story short, I can fork it and fix it (and submit a PR if this repo is still being maintained) ... but I'm curious as to why it's in this state. Is this intended to be just a demo but support was purposely removed, or just an oversight? This also causes the solution accelerator on the databricks website to not work.

Zachary-Higgins commented 9 months ago

This is the solutions accelerator notebook I'm referring to.

https://notebooks.databricks.com/notebooks/HLS/interop/index.html?_ga=2.258627789.699775192.1677488425-455734848.1677481835&_gl=1*1trrc06*_gcl_aw*R0NMLjE2OTIxOTU5OTguQ2p3S0NBanc1X0dtQmhCSUVpd0E1UVNNeEVUM1BjMlBTSU1WMDRmSDZvV3VUOWhWdi1SVm84ZUxwek8zRU9feEhiZTZjWTJvdWs2ZlVCb0NRNW9RQXZEX0J3RQ..*_gcl_au*MTE5ODI5NjE5OS4xNjg3NTI1OTU0*rs_ga*ZDIwOTUzMGEtY2I3NC00NjQ1LTg1YTUtMjM2YjQ4MjBkMzFh*rs_ga_PQSEQ3RZQC*MTY5NDU0MjczMDA2OS43OS4wLjE2OTQ1NDI3MzQuNTguMC4w#interop_2.html

zavoraad commented 8 months ago

Hi, @Zachary-Higgins this is updated in pull request https://github.com/databrickslabs/dbignite/pull/34

Until this merge request is completed, you can install from the forked branch.

%pip install git@github.com:databricks-industry-solutions/dbignite-forked.git

Zachary-Higgins commented 8 months ago

Sorry, just getting around to follow-up on this. I'll bring this up to our team on Monday and we'll take a look. Thanks a lot.

Zachary-Higgins commented 8 months ago

Thanks @zavoraad. I was able to kind of get it working.

I had to add fhir_model.df = fhir_model.asWholeTextfile(BUNDLE_PATH), because it wasn't reading the defaultResource to read the BUNDLE_PATH correctly when using just fhir_model=FhirBundles(BUNDLE_PATH). Anyway, I may have used that class wrong, but that's how I understood it in my limited time.

from dbignite.omop.data_model import *

cdm_database='dbignite_demo' 
sql(f'DROP SCHEMA IF EXISTS {cdm_database} CASCADE;')
sql(f'CREATE SCHEMA {cdm_database};')

BUNDLE_PATH="s3://hls-eng-data-public/data/synthea/fhir/fhir/"
fhir_model=FhirBundles()
fhir_model.df = fhir_model.asWholeTextfile(BUNDLE_PATH)
cdm_model=OmopCdm(cdm_database)
fhir2omop_transformer=FhirBundlesToCdm(spark)
fhir2omop_transformer.transform(fhir_model,cdm_model)

I'm sure there is a cleaner way to fix it. Anyway, it appears that the FHIR bundles did load to OMOP for the 4 domains that are currently supported. Hopefully more domains in the future? Unless that's up to us to figure out?

zavoraad commented 8 months ago

Thanks @Zachary-Higgins I see what is happening here. We split up some of the class responsibilities and are allowing lazy evaluation of bundles (not documented either, my bad, will work on adding this info back in).

Something like this should work

fhir_model=FhirBundles(BUNDLE_PATH)
fhir_model.loadEntries() 
#the above line was previously the responsibility of the transformer class, but because we wanted to flexible with supporting streaming use cases so the code was abstracted to a degree and that line was removed 

#... etc and then you can do

fhir2omop_transformer.transform(fhir_model,cdm_model)

We'd welcome contributions for more domains in the future. The OMOP transformer in this repo is primarily designed to map C-CDA FHIR messages to OMOP and this transformer design "may" not be relevant to the many other types of transformations in healthcare like claims, prior authorizations, patient ADTs, eligibility enrollments, patient scheduling, electronic attachments, etc etc. The number of permutations to solve for is also compounded by different organizations interpreting (correctly/incorrectly) FHIR fields and their meanings.

Instead of solving for all these permutations, we are solving for enabling teams to easily run ETL through SQL or the Dataframe API across all the 157 FHIR resources. The demo here looks to demonstrate these capabilities

Zachary-Higgins commented 8 months ago

Hey @zavoraad, thanks for that quick reply and the suggestions. Unfortunately, I didn't have any success with the suggested approach, but based on your comments I understand that we really shouldn't concern ourselves too much with the actual FHIR to OMOP transformer since we'll likely be writing our own. I did test the FHIR mapping using a streaming DF, and it worked quite well.

Moving forward, I think we have enough to accomplish what we needed to with this solution accelerator. As far as the "solution accelerator notebook not working", my issue is closed. Not sure if you want to leave this open to track the merge, but I was able to do what I set out to do.

Thanks!

zavoraad commented 8 months ago

Great to hear! I will go ahead and close this with the follow up to update more of our OMOP documentation here. Appreciate the feedback and collaboration