pre-compute the concept - concept_relationship join

chrisroederucdenver commented 3 months ago

2024-09-25 see update comment at bottom for current needs.

We need a table that maps from (OID, concept_code) to the concept_id of a standard concept related via 'maps to' to the input concept. It would help being able to use this table from Pandas if it were restricted to concepts found in the input data. An alternate scoping strategy would be to reduce the vocabularies used as the CONCEPT.csv over in CCDA_OMOP_Private may have more than we need?

Resources:

CCDA_OMOP_Private repo has the OMOP concept table in it. We will need to fetch a corresponding concept_relationship table from the same download so it has the same vocabularies.
the CCDA-tools repo has a pair of tools to extract concepts from a batch of documents. section_code_snooper.py and header_code_snooper.py collect concepts as (OID, concept_code) pairs from sections within documents and the header respectively.
Somewhere on foundry is an OID to vocabulary_id table, surely also available elsewhere on the 'net.

Depending on data size and machine capacity, this might be a challenge for Pandas.

chrisroederucdenver commented 3 months ago

CONCEPT_RELATIONSHIP is in the private repo as a gzipped file.

git@github.com:chrisroederucdenver/CCDA_OMOP_Private.git This repo uses git-lfs. I don't think it's an issue when cloning for read purposes, but it is for committing.

chrisroederucdenver commented 3 months ago

I wanted to see if Pandas could take the seemingly large concept and concept_relationship tables, so I wrote some experiemental code: https://github.com/chrisroederucdenver/CCDA-tools/blob/main/create_map_to_standard.py

I was really interested in what mappings we get from CCDA section to OMOP domain, and I was able to produce that: section --> domain_id

Encounters --> Measurement Encounters --> Observation Hospital_Discharge --> Note Medications --> Drug Medications --> Observation Procedures --> Measurement Procedures --> Procedure Results --> total [Moles/volume] in Serum or Plasma" Results --> Measurement Results --> Observation Vital_Signs --> Measurement Vital_Signs --> Observation

chrisroederucdenver commented 3 months ago

FYI, and TODO

The OMOP vocabulary tables  are in the CCDA_OMOP_Private repository.
The OID map is in the CCDA-data repository.

The  output has a zillion columns. We need concept_id and domain_id for sure.
section is useful for considering the domain_id routing issue: what  CCDA sections
produce data for which OMOP domains.

TODO: this code so far doesn't deal with dates or the  invalid_reason column.

chrisroederucdenver commented 2 months ago

Update: Input tables are in Foundry:

The expanded one looks more like what Tanner and Steph have been working with, and it's possible that their created table will look like that.

One question worth considering carefully is if the result from Tanner and Steph will be a separate table, or additions to this one. It seems simpler with regards to separate processess that update input and output to have separate tables. If the vocab process by Steph and Tanner updates this table, it seems like it would have to be carefully coordinated with either re-writes or updates to add more input concepts. @tannerzhang @stephanieshong

chrisroederucdenver commented 3 weeks ago

This is part of @tannerzhang 's work in #120

cladteam / CCDA_OMOP_by_Python

pre-compute the concept - concept_relationship join #39

I was really interested in what mappings we get from CCDA section to OMOP domain, and I was able to produce that: section --> domain_id