Closed Miking98 closed 2 weeks ago
I think giving users the option to do this as far upstream as possible is helpful because then they can be confident that any downstream analyses won't have to consider whether they're dealing with source v. standard concepts, and can just rely on the fact that this conversion was done earlier.
This is helpful when someone is creating an extract that they know they want to use standard codes, and only standard codes.
At the very least, I don't think adding an optional flag is harmful, and it's simpler to do it here.
At the very least, I don't think adding an optional flag is harmful, and it's simpler to do it here.
From a practical perspective, this option flag would double the number of MEDS datasets that need to be maintained if there exists any code that expects it to be set. So we would burn twice as much disk, twice as much bandwidth, etc, etc.
I don't want people to be in a situation where they needs MEDS extract A to run model X and MEDS extract B to run model Y, where A and B have mutually incompatible flags.
And, as far as I can tell, there is still zero benefit to doing this here. If you need the OMOP concepts, MEDS gives you all the tools to get those. You don't need to remove the source codes to get access to the OMOP concepts.
then they can be confident that any downstream analyses won't have to consider whether they're dealing with source v. standard concept
This is false
There are two types of analysis. Analysis that is ontology sensitive and analysis that is not ontology sensitive.
In the current setup ontology sensitive code can easily transform the input into whatever ontology it needs for analysis.
In your proposed setup, ontology sensitive code will require the whole dataset artifact to have a particular ontology structure. Which will cause confusing failures and issues if you are mixing code that has different ontology assumptions.
Ontology sensitive code in both setups will require special care.
I'm really not a fan of this sort of change.
It's both:
Better handled in user code
Adds additional complexity and artifact diversity
This change will make it harder to write generic code while also not enabling any new features.
@Miking98 Can you articulate why this needs to be done in the ETL and cannot be done in user libraries (like femr)