OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
215 stars 75 forks source link

BuildRxE.sql unexpected missing mappings due to precedence #1048

Open lore-edencehealth opened 1 week ago

lore-edencehealth commented 1 week ago

Describe the problem in content or desired feature Precedence of dose forms does not seem to work as expected. When you add more rows to drug_concept_stage or delete rows from drug_concept_stage, the mapping of an existing row can change. I was expecting precedence to work on a row by row basis. If mapping with precedence 1 is not found then look for mapping with precedence 2, but here it only seems to try to find a mapping for the dose form precedence that was found to have the most full mappings overall or if there were no full mapping then use the one with precedence 1. So if drug_concept_1 has a mapping with doseform precedence1 and drug_concept_2 has a mapping with doseform precedence2 only 1 of the 2 will be found. All output mappings have the same dose form concept_id even if some of them would have existing mappings for a concept_id of the same dose form with different precedence.

How to find it Files in attachment can be used to find the difference in output. Files with '_noconv' have less rows and has output that does not find a lot of mappings even though they exist. The difference is clear in x_df, which originates from x_pattern.

Expected adjustments Precedence that is used to find mappings should not depend on the full input but only on 1 source_drug at a time.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Adding extra rows or deleting rows from drug_concept_stage may result in different mapping or not finding an existing mapping. x_df contains different information when running with more rows as input. An existing Clinical Drug Form is found in almost all cases for the long input file, no mapping is found when less rows are used as input (_noconv files).

x_df for file with more rows: IH,19095898,36217207,Inhalation Solution x_df for file with less rows (_noconv): IH,19018195,36217207,Inhalant

R03BA02_IH_400_ug_ug and R03BB04_IH_2.5_ug_ug seem to be the main reason for the change in mapping (because a Clinical Drug mapping is found for these 2 source drugs which makes x_pattern have more rows where df_id is filled in).

drug_concept_stage_noconv.csv drug_concept_stage.csv ds_stage_noconv.csv ds_stage.csv internal_relationship_stage_noconv.csv internal_relationship_stage.csv relationship_to_concept_noconv.csv relationship_to_concept.csv

TinyRickC137 commented 6 days ago

Great job describing this bug!

As of now, fixing BuildRxE is not a priority, and the priorities are defined in our Roadmap. We finished the last four releases according to it.

The next phase (Roadmap for the February 2025 Release) will be published soon.