Medical-Event-Data-Standard / meds_etl

A collection of ETLs from common data formats to Medical Event Data Standard
Apache License 2.0
16 stars 3 forks source link

Option to set concept v. source as default in OMOP ETL #20

Closed Miking98 closed 2 weeks ago

EthanSteinberg commented 2 months ago

I'm really not a fan of this sort of change.

It's both:

This change will make it harder to write generic code while also not enabling any new features.

@Miking98 Can you articulate why this needs to be done in the ETL and cannot be done in user libraries (like femr)

Miking98 commented 2 months ago

I think giving users the option to do this as far upstream as possible is helpful because then they can be confident that any downstream analyses won't have to consider whether they're dealing with source v. standard concepts, and can just rely on the fact that this conversion was done earlier.

This is helpful when someone is creating an extract that they know they want to use standard codes, and only standard codes.

At the very least, I don't think adding an optional flag is harmful, and it's simpler to do it here.

EthanSteinberg commented 2 months ago

At the very least, I don't think adding an optional flag is harmful, and it's simpler to do it here.

From a practical perspective, this option flag would double the number of MEDS datasets that need to be maintained if there exists any code that expects it to be set. So we would burn twice as much disk, twice as much bandwidth, etc, etc.

I don't want people to be in a situation where they needs MEDS extract A to run model X and MEDS extract B to run model Y, where A and B have mutually incompatible flags.

And, as far as I can tell, there is still zero benefit to doing this here. If you need the OMOP concepts, MEDS gives you all the tools to get those. You don't need to remove the source codes to get access to the OMOP concepts.

EthanSteinberg commented 2 months ago

then they can be confident that any downstream analyses won't have to consider whether they're dealing with source v. standard concept

This is false

There are two types of analysis. Analysis that is ontology sensitive and analysis that is not ontology sensitive.

In the current setup ontology sensitive code can easily transform the input into whatever ontology it needs for analysis.

In your proposed setup, ontology sensitive code will require the whole dataset artifact to have a particular ontology structure. Which will cause confusing failures and issues if you are mixing code that has different ontology assumptions.

Ontology sensitive code in both setups will require special care.