OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
219 stars 75 forks source link

ICD10-CM not mapped to SNOMED #125

Closed gowthamrao closed 6 years ago

gowthamrao commented 7 years ago

An impressive 91,594 ICD-10-CM codes are mapped to SNOMED out of 106,666 ICD-10-CM codes in OMOP vocabulary (in current version). Of the unmapped 15,000 - I thought of recommending that we map the following 1,000 (sorted by priority).

Most of these would probably be observations.

icd concept.xlsx

or maybe we just can't map them to SNOMED. If yes, then is there a way to know that there is no way to map them?

dimshitc commented 7 years ago

Well, I just took your 1000 concept list. and only 24 of them I found as not-mapped, look: select count(distinct c.concept_id) from concept c join rao_list o --your 1000 concepts on o.ICD_CM_CODE =replace (c.concept_code, '.') join concept_relationship r on c.concept_id = r.concept_id_1 and r.relationship_id in ('Maps to', 'Maps to value') and r.invalid_reason is null join concept d on d.concept_id = r.concept_id_2 and d.vocabulary_id = 'SNOMED' where c.vocabulary_id ='ICD10CM'
; How did you get this not-mapped list actually?

On the other hand we really have 17355 concepts without a mappnig, but most of them are outdated: select c.invalid_reason as out_dated_status, count(1) from concept c left join concept_relationship r on c.concept_id = r.concept_id_1 and r.relationship_id in ('Maps to', 'Maps to value') and r.invalid_reason is null left join concept d on d.concept_id = r.concept_id_2 and d.vocabulary_id = 'SNOMED' where c.vocabulary_id ='ICD10CM' and d.concept_id is null group by c.invalid_reason ; can you check if any of these Deprecated (i.e. outdated) concepts are used in patient data?

ericaVoss commented 7 years ago

I tried to replicate this issue in Vocab 20170503 however I'm only finding 1,341 distinct CONCEPT_CODES of 92,957 in ICD10CM that are unmapped. Not too bad.

I do agree the majority of them are deprecated ICD10CMs however that shouldn't drive the reason why they don't have a map.

I agree with your approach of seeing the usage in some data, but I don't have the new Vocab next to some patient data. I will follow-up when I get it there.

Some Examples:

35205480    Gastrointestinal tularemia
35205481    Generalized tularemia
1567452 Sequelae of other and unspecified infectious and parasitic diseases
35206022    Sequelae of other specified infectious and parasitic diseases
35206023    Sequelae of unspecified infectious and parasitic disease
35206254    Malignant neoplasm of right ovary
35206271    Malignant neoplasm of right kidney, except renal pelvis
35206272    Malignant neoplasm of right renal pelvis
35206273    Malignant neoplasm of right ureter
35206317    Malignant neoplasm of other ill-defined sites
35206335    Other types of follicular non-Hodgkin's lymphoma, unspecified site
35206336    Other types of follicular non-Hodgkin's lymphoma, lymph nodes of head, face, and neck
35206337    Other types of follicular non-Hodgkin's lymphoma, intrathoracic lymph nodes

My Query:

WITH CTE_VOCAB_MAP AS (
       SELECT c.concept_code AS SOURCE_CODE, c.concept_id AS SOURCE_CONCEPT_ID, c.concept_name AS SOURCE_CODE_DESCRIPTION, c.vocabulary_id AS SOURCE_VOCABULARY_ID, 
                           c.domain_id AS SOURCE_DOMAIN_ID, c.CONCEPT_CLASS_ID AS SOURCE_CONCEPT_CLASS_ID, 
                                                   c.VALID_START_DATE AS SOURCE_VALID_START_DATE, c.VALID_END_DATE AS SOURCE_VALID_END_DATE, c.INVALID_REASON AS SOURCE_INVALID_REASON, 
                           c1.concept_id AS TARGET_CONCEPT_ID, c1.concept_name AS TARGET_CONCEPT_NAME, c1.VOCABULARY_ID AS TARGET_VOCABUALRY_ID, c1.domain_id AS TARGET_DOMAIN_ID, c1.concept_class_id AS TARGET_CONCEPT_CLASS_ID, 
                           c1.INVALID_REASON AS TARGET_INVALID_REASON, c1.standard_concept AS TARGET_STANDARD_CONCEPT
       FROM CONCEPT C
             JOIN CONCEPT_RELATIONSHIP CR
                        ON C.CONCEPT_ID = CR.CONCEPT_ID_1
                        AND CR.invalid_reason IS NULL
                        AND cr.relationship_id = 'Maps To'
              JOIN CONCEPT C1
                        ON CR.CONCEPT_ID_2 = C1.CONCEPT_ID
                        AND C1.INVALID_REASON IS NULL
       UNION
       SELECT source_code, SOURCE_CONCEPT_ID, SOURCE_CODE_DESCRIPTION, source_vocabulary_id, c1.domain_id AS SOURCE_DOMAIN_ID, c2.CONCEPT_CLASS_ID AS SOURCE_CONCEPT_CLASS_ID,
                                        c1.VALID_START_DATE AS SOURCE_VALID_START_DATE, c1.VALID_END_DATE AS SOURCE_VALID_END_DATE, 
                     stcm.INVALID_REASON AS SOURCE_INVALID_REASON,target_concept_id, c2.CONCEPT_NAME AS TARGET_CONCEPT_NAME, target_vocabulary_id, c2.domain_id AS TARGET_DOMAIN_ID, c2.concept_class_id AS TARGET_CONCEPT_CLASS_ID, 
                     c2.INVALID_REASON AS TARGET_INVALID_REASON, c2.standard_concept AS TARGET_STANDARD_CONCEPT
       FROM source_to_concept_map stcm
              LEFT OUTER JOIN CONCEPT c1
                     ON c1.concept_id = stcm.source_concept_id
              LEFT OUTER JOIN CONCEPT c2
                     ON c2.CONCEPT_ID = stcm.target_concept_id
       WHERE stcm.INVALID_REASON IS NULL
)
SELECT *
FROM CONCEPT c
    LEFT OUTER JOIN (
        SELECT *
        FROM CTE_VOCAB_MAP
        WHERE SOURCE_VOCABULARY_ID IN ('ICD10CM')
        AND TARGET_STANDARD_CONCEPT IS NOT NULL
        AND TARGET_INVALID_REASON IS NULL
    ) z
        ON z.SOURCE_CODE = c.CONCEPT_CODE
WHERE c.VOCABULARY_ID = 'ICD10CM'
AND z.SOURCE_CODE IS NULL
ORDER BY c.CONCEPT_CODE
ericaVoss commented 7 years ago

Only 80 of the source codes without maps show up in my largest claim's DB:

CONDITION_SOURCE_VALUE  PERSON_COUNT
R202    266957
H40013  172379
G609    42505
H40023  30391
G608    10845
H40019  9748
H40011  7431
H40012  5978
L852    3459
P700    3011
T6591XA 2801
H3500   2740
K0521   1926
H40021  1740
H40022  1726
K045    1368
T65891A 1279
H40029  1063
K0520   945
B948    749
K044    688
T6594XA 642
B949    496
T884XXA 452
E11641  442
T6592XA 418
T65811A 297
T6591XD 166
T65892A 160
T65894A 115
T884XXD 80
K0522   69
T65811D 61
P618    56
T504X1A 49
T65891D 45
T65814A 40
S548X1A 38
H59011  35
S08811A 34
A213    32
R847    30
S0889XA 30
T6592XD 29
S548X2A 28
S5491XA 28
S5492XA 27
T6594XD 26
H59012  24
T65831A 20
T6593XA 20
P619    16
S548X1D 16
S5490XA 14
N46024  13
T65814D 13
A217    12
H59013  10
S5491XD 10
T65893A 10
S08812A 9
S548X2D 8
T65894D 8
S548X9A 7
T504X2A 6
T65892D 5
O2960   4
T65813A 4
T65834A 4
S5492XD 3
T504X1D 3
T504X2D 2
T504X4A 2
T6593XD 2
H59019  1
S0889XD 1
S548X9D 1
S5490XD 1
T65812A 1
T65833A 1
pbr6cornell commented 7 years ago

These should be fixed, but I'm encouraged that 1) the prevalence of these codes is low, 2) the number of unmapped codes is low, and 3) the most prevalent among these codes are not likely impactful for most clinical research.

On Sat, Jun 17, 2017 at 9:52 AM, ericaVoss notifications@github.com wrote:

Only 80 of the source codes without maps show up in my largest claim's DB:

CONDITION_SOURCE_VALUE PERSON_COUNT R202 266957 H40013 172379 G609 42505 H40023 30391 G608 10845 H40019 9748 H40011 7431 H40012 5978 L852 3459 P700 3011 T6591XA 2801 H3500 2740 K0521 1926 H40021 1740 H40022 1726 K045 1368 T65891A 1279 H40029 1063 K0520 945 B948 749 K044 688 T6594XA 642 B949 496 T884XXA 452 E11641 442 T6592XA 418 T65811A 297 T6591XD 166 T65892A 160 T65894A 115 T884XXD 80 K0522 69 T65811D 61 P618 56 T504X1A 49 T65891D 45 T65814A 40 S548X1A 38 H59011 35 S08811A 34 A213 32 R847 30 S0889XA 30 T6592XD 29 S548X2A 28 S5491XA 28 S5492XA 27 T6594XD 26 H59012 24 T65831A 20 T6593XA 20 P619 16 S548X1D 16 S5490XA 14 N46024 13 T65814D 13 A217 12 H59013 10 S5491XD 10 T65893A 10 S08812A 9 S548X2D 8 T65894D 8 S548X9A 7 T504X2A 6 T65892D 5 O2960 4 T65813A 4 T65834A 4 S5492XD 3 T504X1D 3 T504X2D 2 T504X4A 2 T6593XD 2 H59019 1 S0889XD 1 S548X9D 1 S5490XD 1 T65812A 1 T65833A 1

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OHDSI/Vocabulary-v5.0/issues/125#issuecomment-309216423, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsrGsLumWeGdcdv-d8L32PtolcseEW4ks5sE9oZgaJpZM4NlLOu .

gowthamrao commented 7 years ago

I lost this thread! sorry

@dimshitc I put the 1,000 prioritzed codes as a concept-set on Atlas here http://www.ohdsi.org/web/atlas/#/conceptset/904152/details

Randomly chose about 20 - and found them to not have a mapping. i.e. none of them had a related or hierarchical/descendant concept-id's mapped

@ericaVoss absolutely agree with your suggestion to prioritize based on usage. The 1,000 I proposed in the thread was based on a sort of record-counts in ascending order

ericaVoss commented 7 years ago

There seems to be more of these in 20170920 vs 20170503.

I tested the prevalence of the codes in our largest claims DB. I only found 107 of the about 3K missing.

condition_source_value | count
-----------------------+------
R9089                  | 17779
X500XXA                |  7620
X501XXA                |  5471
X509XXA                |  5133
T6591XA                |  4379
T65891A                |  2205
T6594XA                |   972
X503XXA                |   954
Z191                   |   902
T6592XA                |   780
P0519                  |   609
T65811A                |   510
X500XXD                |   438
R29700                 |   402
T65892A                |   352
T6591XD                |   250
R29701                 |   228
X501XXD                |   191
S5491XD                |   188
T65894A                |   174
X509XXD                |   172
R29702                 |   156
P618                   |   146
Z192                   |   139
R29703                 |   120
X503XXD                |   112
P0509                  |    94
T65811D                |    93
R29704                 |    87
T6594XD                |    86
T65891D                |    77
T504X1A                |    76
S548X1A                |    64
T65814A                |    61
T6592XD                |    61
S0889XA                |    58
S08811A                |    54
R29705                 |    53
X500XXS                |    53
H59011                 |    50
S5492XA                |    49
S548X1D                |    47
H59012                 |    46
R29707                 |    46
S548X2A                |    42
S5491XA                |    42
R29706                 |    41
R847                   |    40
S548X2D                |    40
T6593XA                |    34
T65831A                |    29
P619                   |    28
R29709                 |    25
R29708                 |    24
T65893A                |    22
T65814D                |    21
T65894D                |    20
X503XXS                |    20
R29714                 |    19
S5490XA                |    18
X501XXS                |    18
R29711                 |    17
R29710                 |    16
S08812A                |    15
H59013                 |    13
R29712                 |    13
X509XXS                |    13
T504X2A                |    11
S548X9A                |    10
R29713                 |     8
R29715                 |     8
R29717                 |     8
R29718                 |     8
R29716                 |     7
T65892D                |     7
R29719                 |     6
R29720                 |     5
R29721                 |     5
R29726                 |     5
S5492XD                |     5
T65813A                |     5
R29722                 |     4
S5490XD                |     4
T504X1D                |     4
T65834A                |     4
R29724                 |     3
R29735                 |     3
T504X4A                |     3
T65812A                |     3
H59019                 |     2
R29723                 |     2
R29729                 |     2
R29732                 |     2
S548X9D                |     2
T504X2D                |     2
R29725                 |     1
R29727                 |     1
R29728                 |     1
R29730                 |     1
R29733                 |     1
R29734                 |     1
S0889XD                |     1
T65833A                |     1
T6593XD                |     1

That top one R9089 (R90.89) Diagnosis Description: Oth abnormal findings on diagnostic imaging of cnsl (Other abnormal findings on diagnostic imaging of central nervous system). Looks like we don't have a good SNOMED for this any more.

aostropolets commented 6 years ago

@ericaVoss you may check now, most of the codes are mapped. @gowthamrao I've noticed that you use codes with small 'x' in your concept set. These are invalid, so you need to use the same ones but with the capital 'X'. For instance, W22.01xA Walked into wall, initial encounter is deprecated, whereas W22.01XA (concept_id 45556610) is mapped to 432788 Accidental bumping into stationary object.

dimshitc commented 6 years ago

the vocabulary was updated on 2017-12-11 and it has all the codes, so I'm closing this issue. Note, that those codes with small 'x' like W22.01xA are deprecated, so you need to use standard way of code representation - all capital