Medical-Event-Data-Standard / meds_etl

A collection of ETLs from common data formats to Medical Event Data Standard
Apache License 2.0
16 stars 3 forks source link

Patient timelines using nonstandard codes in OMOP to MEDS conversion #32

Open akwasigroch opened 1 week ago

akwasigroch commented 1 week ago

I am trying to convert an OMOP dataset into the MEDS standard. While the scripts run correctly, I am encountering an issue where the patients' timelines after conversion are recorded using only nonstandard codes. I noticed that the function write_event_data() is reading source codes first, which should be converted to normalized concepts according to the comment in the code. However, the line code = concept_id.replace(concept_id_map) is transforming concept_id into concept_code, without referring to the standard codes. How can I fix this problem?

EthanSteinberg commented 1 week ago

Hi @akwasigroch,

Sorry for the unclear documentation, but the design of writing non-standard codes is intentional here. The goal is to make the OMOP datasets as "similar as possible" to the non-OMOP datasets to make it easier to write generic labeling / featurization code.

You should still have access to the standard concepts by using Athena. And we try to save everything that is not in Athena to the metadata.json file.

See https://github.com/som-shahlab/femr/blob/main/tutorials/1_Ontology.ipynb for an example of what that might look like for a MEDS client library.

I think it would be OK to store the standard codes in property, maybe "standard_code" if it seems to be causing issues though? Let me know if you think that would solve your problems. (I mainly just want to keep storing the non-standard properties in "code").