cladteam / CCDA_OMOP_by_Python

2 stars 3 forks source link

Non-compliant documents #146

Closed chrisroederucdenver closed 2 weeks ago

chrisroederucdenver commented 2 weeks ago

A question came up about how to map race codes using vocabularies not specified in CCDA. As such, they are non-standard documents. The question was if we should include mappings from vocabularies that are not specified in the CCDA standard.

Specifically we had data that had the concept code "2106-3" from the vocabulary with OID 2.16.840.1.113883.6.238, the PHIN VADS CDC Race and Ethnicity vocabulary. The CCDA spec expects a vocabulary with OID 2.16.840.1.113883.5.104, a FHIR terminology.

My position is that we are not enforcing the CCDA standard, and to the reasonable best of our abilities we will import data that is outside of the standard. I consider this to be one of those. The point of this ticket is to raise that issue and address any objections.

Copy of my e-mail to Tanner: Hi Tanner, Thanks for these vocab tables. It’s great putting them to use. It’s too late on a Friday afternoon, but I’m looking at what might be a vocab issue when mapping the Person race from CCD-Sample.xml (there might other instances, but this is one). Can you double check me when you get a chance (after the weekend)?

Thanks -Chris

The CCDA file has:

The snooper file, vocab_discovered_codes has the following rows for 2106-3. https://foundry.cladplatform.org/workspace/data-integration/dataset/preview/ri.foundry.main.dataset.e5fc74fe-6b25-4de0-b722-088575f62ed9/master

Screenshot 2024-11-13 at 12 23 42 PM

I’m not getting a mapping. I think I should be using the valueset table, ccda_value_set_mapping_table_dataset in the mapping-reference-files folder. https://foundry.cladplatform.org/workspace/data-integration/dataset/preview/ri.foundry.main.dataset.5ba1594c-ef36-4eeb-b6c4-d3d005dbd127/master In it, I see an entry for 2106-3, but not for ….113883.6.238 (PHIN VADS CDC)

Screenshot 2024-11-13 at 12 24 12 PM
chrisroederucdenver commented 2 weeks ago

I ran this by Rob and he agreed that within reason it's OK to stretch a bit to process data that may not be 100% in spec. e-mail from Nov. 14 Kind of feels to me like we should process what we can in the time we will have, but it will be most important in the short time we have to process and map a broad array of data and at least characterize deviations, and compute some metrics for evaluation purposes of what is “left over”. Does this make sense?