biothings / semmeddb

1 stars 1 forks source link

Name, SemanticType lookup for retired CUIs #5

Closed erikyao closed 1 year ago

erikyao commented 1 year ago

Problem

When replacing a retired UMLS ID, its name, semantic type abbreviation/name should be replaced at the same time, but the retired CUI table, as in the MRCUI.RFF file of UMLS Metathesaurus, contains only UMLS IDs.

E.g. in the source file of SemMedDB predications, the following record

UMLSID Name SemanticTypeAbv SemanticTypeName
C0021311 Infection dsyn Disease or Syndrome

according to MRCUI.RFF, should be replaced by

UMLSID Name SemanticTypeAbv SemanticTypeName
C0009450 ??? ??? ???

But MRCUI.RFF only tells you C0021311 => C0009450 replacement. The new "Name", "SemanticTypeAbv", and "SemanticTypeName" should be filled from other data sources.

P.S. the fully replaced record should be like:

UMLSID Name SemanticTypeAbv SemanticTypeName
C0009450 Communicable Diseases dsyn Communicable Diseases

Solution

Step 1: UMLS ID => Subject/Object Name

Should be queryable in MRCONSO.RRF, file of Concept Names and Sources.

However each UMLS ID might have multiple records. Inspired by Example 7 of UMLS Database Query Diagrams,

# 7. Find all relationships for a concept and the preferred (English) name of the CUI2.

SELECT a.cui1, a.cui2, b.str FROM mrrel a, mrconso b
WHERE a.cui1 = 'C0032344'
     AND a.stype1 = 'CUI'
     AND a.cui2 = b.cui
     AND b.ts = 'P'
     AND b.stt = 'PF'
     AND b.ispref = 'Y'
     AND b.lat = 'ENG';

the filtering condition is

    TS == 'P'  # Term Status being "Preferred LUI of the CUI"
and STT == 'PF'  # String Type being "Preferred form of term"
and ISPREF == 'Y'  # Atom status being "preferred" (Y) for this string within this concept
and LAT == 'ENG' # Language of Terms being "English"

The explanation of other TS, STT, and LAT values can be found at Abbreviations Used in Data Elements - 2022AB Release. The meaning of ISPREF is explained at Table 1, UMLS® Reference Manual.

CUI names are recorded in the STR column.

Step 2: UMLS ID => Semantic Type Name

Query MRSTY.RRF, file of Semantic Types.

Step 3: Semantic Type Name => Semantic Type Abbreviation

Query Semantic Type Mappings.

erikyao commented 1 year ago

Dec 13 decision with Chunlei:

Make an intermediate UMLS file containing (UMLSID, EntityName, SemanticTypeAbv, SemanticTypeFullName). May server as the source file for future UMLS endpoint.

erikyao commented 1 year ago

Dec 14 decision with Colleen:

If a mapped new UMLS ID already appears in the SemMedDB predication CSV file, use its EntityName, SemanticTypeAbv, SemanticTypeFullName before checking RRFs