OHDSI / OncologyWG

Oncology Working Group Repository
https://ohdsi.github.io/OncologyWG
Apache License 2.0
56 stars 24 forks source link

How should we handle the different flavors of ‘Unknown’ within NAACCR: ingest or not ingest? #30

Closed mgurley closed 5 years ago

mgurley commented 5 years ago
cgreich commented 5 years ago

We could. And we don't have to map to 0. We just don't map, which means, the ETL will put in 0.

mgurley commented 5 years ago

@dimshitc No ingesting of any flavor of 'Unknown'.

cgreich commented 5 years ago

Wait. We would ingest, but not map to anything, and not make it standard. Right?

dimshitc commented 5 years ago

Right, ETL should be able distinguish between the real number and the code for unknown. Gleason's score = 5 means "5" goes to value_as_number Gleason's score = 998 (No prostatectomy/autopsy performed) means the value_as_concept_id =0. So we need to have the list of the source concepts indicating Unknown. Just non standard without mapping.

cgreich commented 5 years ago

I think currently for flavor of null (have no value) is NULL, not 0.

mgurley commented 5 years ago

There seem to be quite a few flavors of unknown that are still standard concepts. I think we should revisit the decision to make flavors of unknown to be non-standard. Trying to determine all that should be made non-standard has enormous curation overhead. Here is just a sampling on numeric concepts. Many should indeed be standard concepts; but many others should not.

value_as_concept_id concept_name concept_code standard_concept
35934209 No involved regional nodes melanoma_nasal_cavity@2880@000 S
35933329 No surgical specimen from primary site sinus_maxillary@2865@998 S
35939229 "Ascites present, determined to be non-malignant" ovary@2920@991 S
35934159 Test not done (test not ordered and not performed) breast@2866@998 S
35936976 "Described as ""greater than 5 mm""" colon@2930@996 S
35933006 No para-aortic lymph nodes examined corpus_carcinoma@2920@098 S
35919073 "Tumor Deposits identified, number unknown" 3934@X2 S
35936528 "Described as ""less than 45 cm"" or ""greater than 40 cm"" or ""between 40 and 45 cm""" esophagus_gejunction@2910@995 S
35941045 No sentinel nodes were biopsied 835@98 S
35940004 "Regional lymph node(s) involved, size not stated;Unknown if regional lymph nodes involved;Not documented in patient record" oropharynx@2880@999 S
35929615 "Pelvic lymph nodes surgically removed, but number of nodes unknown/not stated and not documented as a sampling or dissection" corpus_carcinoma@2910@098 S
35923183 "Biopsy cores examined, number unknown" prostate@2867@991 S
35926018 No regional lymph nodes involved bladder@2890@000 S
35933485 "No resection of primary site ;Surgical procedure did not remove enough tissue to measure the CRM;(Examples include: polypectomy only, excision of tumor only or excisional biopsy only)" rectum@2930@998 S
35919025 Stated as 91-100% 3826@R99 S
35927687 "Poorly differentiated tumor present, percent not stated" penis@2865@990 S
35923562 No histologic specimen from primary site pancreas_other@2900@998 S
35936175 No pelvic nodes examined corpus_sarcoma@2900@098 S
35935647 No lung metastasis resected bone@2910@000 S
35935629 "Positive nodes, number unspecified" breast@2900@097 S
35925275 Test not done (test not ordered and not performed) melanoma_skin@2930@998 S
35937834 Ascites not assessed ovary@2920@998 S
35940097 9.80 millimeters or larger ;(Includes cases converted from codes 981-989 during conversion to V0200) melanoma_skin@2880@980 S
35921422 Test not done (test not ordered and not performed) breast@2877@998 S
35934941 Test not done (test not ordered and not performed) pancreas_body_tail@2890@998 S
35925130 Microinvasion; microscopic focus or foci only and no depth given;Not documented in patient record;Unknown; depth not stated melanoma_skin@2880@999 S
35924599 No involved regional nodes palate_soft@2880@000 S
35929661 "Margins clear, distance from tumor not stated;CRM negative, NOS" rectum@2930@991 S
35919794 "Margins clear, distance from tumor not stated;Circumferential or radial resection margin negative, NOS;No residual tumor identified on specimen" 3823@XX.1 S
35927141 Test not done (test not ordered and not performed) pancreas_head@2890@998 S
35926762 No ascites present ovary@2920@995 S
35919717 "Positive nodes, number unspecified" 3882@X5 S
35924035 No para-aortic nodes examined. corpus_sarcoma@2930@000 S
35921592 All pelvic nodes examined negative. corpus_sarcoma@2900@000 S
35937584 Test not done (test not ordered and not performed) small_intestine@2900@998 S
35929763 All nodes examined negative for cancer involvement;All nodes examined negative for extracapsular tumor esophagus_gejunction@2900@000 S
35937192 "Regional lymph node(s) involved, size not stated;Unknown if regional lymph nodes involved;Not documented in patient record" parotid_gland@2880@999 S
35926587 No pelvic nodes examined corpus_carcinoma@2900@098 S
35935968 All para-aortic lymph nodes examined negative corpus_carcinoma@2920@000 S
35920317 Test not done (test not ordered and not performed) rectum@2900@998 S
35928156 "Stated as ""less than 1 mitosis/square mm"";Stated as ""nonmitogenic""" melanoma_skin@2861@990 S
35938930 Ratio of less than 1.00 breast@2864@991 S
35932760 No needle core biopsy performed prostate@2867@998 S
35923385 No needle core biopsy performed prostate@2866@998 S
35936197 "No resection of primary site ;Surgical procedure did not remove enough tissue to measure the CRM;(Examples include: polypectomy only, excision of tumor only or excisional biopsy only)" colon@2930@998 S
35921947 98.0 or greater U/ml bile_ducts_intrahepat@2866@980 S
35926869 98.0 or greater ng/ml rectum@2900@980 S
35930032 No involved regional nodes salivary_gland_other@2880@000 S
35931178 No histopathologic examination of lymph nodes esophagus_gejunction@2900@998 S
35921452 No histologic specimen from primary site pancreas_head@2900@998 S
35937902 98.0 or greater U/ml pancreas_head@2880@980 S
35937853 No axillary nodes examined breast@2900@098 S
35926779 No histologic examination of primary site;AND/OR;No neoadjuvant chemotherapy bone@2900@998 S
35926474 No histologic examination of primary site;Test not done (test not ordered and not performed) cns_other@2890@998 S
35935843 No regional lymph node(s) involved penis@2870@000 S
35933389 Test not done (test not ordered and not performed) breast@2864@998 S
35924616 No regional lymph node(s) involved kidney_parenchyma@2861@000 S
35930350 No histologic examination of primary site. melanoma_skin@2861@998 S
35919126 "Not applicable, invasive case" 3903@XX6 S
35927137 No histologic examination of primary site;Test not done (test not ordered and not performed) brain@2890@998 S
35919737 Stated as 91-100% 3914@R99 S
35942004 "Sentinel lymph nodes were biopsied, but the number is unknown" 834@98 S
35931557 98.0 ng/ml or greater prostate@2880@980 S
35930309 No involved regional nodes sinus_maxillary@2880@000 S
35922673 "Positive nodes, not stated if extracapsular tumor present" esophagus_gejunction@2900@990 S
35919590 "PR negative, or stated as less than 1%" 3914@000 S
35919105 No tumor deposits 3934@00 S
35935407 No prostatectomy/autopsy performed prostate@2864@998 S
35923109 No surgical resection of primary site colon@2910@998 S
35931111 "Regional lymph node(s) involved, size not stated;Unknown if regional lymph nodes involved;Not documented in patient record" salivary_gland_other@2880@999 S
35938317 "Described as ""less than 45 cm"" or ""greater than 40 cm"" or ""between 40 and 45 cm""" esophagus_gejunction@2920@995 S
35919049 Stated as 1-10% 3914@R10 S
35919519 "ER negative, or stated as less than 1%" 3826@000 S
35929079 No involved regional nodes nasal_cavity@2880@000 S
35920861 "TD identified, number unknown" rectum@2910@990 S
35934701 0.0 mitoses per 10 high-power fields (HPF) (40x field);0.0 mitoses per 2 square millimeters (mm);Mitoses absent;No mitoses present net_small_intestine@2930@000 S
35921147 No pelvic lymph nodes examined corpus_carcinoma@2910@000 S
35919087 "Not applicable, in situ case" 3904@XX6 S
35925936 No pelvic lymph nodes examined corpus_sarcoma@2910@000 S
35941132 It is unknown whether sentinel nodes are positive; not applicable; not stated in patient record 835@99 S
35922956 No surgical resection of primary site rectum@2910@998 S
35919455 No mass/tumor found 754@000 S
35919382 All ipsilateral axillary nodes examined negative 3882@00 S
35919582 Positive aspiration or needle core biopsy of lymph node(s) 3882@X6 S
35922662 No involved regional nodes parotid_gland@2880@000 S
35929093 11 or more mitoses per square mm melanoma_skin@2861@011 S
35931549 "TD identified, number unknown" colon@2910@990 S
35932575 Test not done (test not ordered and not performed) pancreas_other@2890@998 S
35935200 Test not done (test not ordered and not performed) stomach@2869@998 S
35936908 98.0 or greater ng/ml colon@2900@980 S
35930658 No histopathologic examination of lymph nodes esophagus@2900@998 S
35921171 0 mitoses per square millimeter (mm);Mitoses absent;No mitoses present melanoma_skin@2861@000 S
35933938 No histologic specimen from primary site pancreas_body_tail@2900@998 S
35922253 "Described as ""less than 1 centimeter (cm)"";;Stated as T1b with no other information on tumor size" breast@2800@991 S
35937985 "Lung metastasis resected, number unknown" bone@2910@099 S
35936489 Test not done (test not ordered and not performed) stomach@2868@998 S
35941398 "Positive sentinel nodes are documented, but the number is unspecified; For breast ONLY: SLN and RLND occurred during the same procedure" 835@97 S
35926531 No para-aortic nodes examined. corpus_carcinoma@2930@000 S
35922647 No mass/tumor found melanoma_skin@2880@000 S
35928512 No needle core biopsy/TURP performed prostate@2862@998 S
35936543 Renal parenchymal invasion not present/not identified kidney_renal_pelvis@2890@000 S
35938496 No involved regional nodes tongue_base@2880@000 S
35919937 "Described as ""less than 2 cm,"" or ""greater than 1 cm,"" or ""between 1 cm and 2 cm"";;Stated as T1 [NOS] or T1c [NOS] with no other information on tumor size" breast@2800@992 S
35941854 It is unknown whether sentinel nodes were examined; not applicable; not stated in patient record 834@99 S
35936415 No residual tumor identified on specimen rectum@2930@990 S
35935470 "Biopsy cores positive, number unknown" prostate@2866@991 S
35939622 No para-aortic lymph nodes examined corpus_sarcoma@2920@098 S
35926715 Test not done (test not ordered and not performed) melanoma_skin@2920@998 S
35919854 "No resection of primary site ;Surgical procedure did not remove enough tissue to measure the circumferential or radial resection margin;(Examples include: polypectomy only, endoscopic mucosal resection (EMR), excisional biopsy only, transanal disk excisi" 3823@XX.7 S
35926455 98.0 or greater U/ml pancreas_body_tail@2880@980 S
35928846 "Margin IS involved with tumor;Circumferential resection margin (CRM) positive;Described as ""less than 1 millimeter (mm)""" rectum@2930@000 S
35923874 "Described as ""less than 3 cm"" or ""greater than 2 cm"" or ""between 2 cm and 3 cm""" kidney_parenchyma@2861@993 S
35934036 No surgical resection of primary site kidney_renal_pelvis@2890@998 S
35939791 "Margins clear, distance from tumor not stated;CRM negative, NOS" colon@2930@991 S
35934919 "CONVERTED AND CODE REUSED V0203 Prior to V0203 code defined as ""Test ordered, results not in chart"". Cases converted to code 997 with V0203 and code 998 redefined as ""Test not done (test not ordered and not performed)"". Test not done (test not ordered an" net_small_intestine@2865@998 S
35938506 All pelvic nodes examined negative. corpus_carcinoma@2900@000 S
35921457 All ipsilateral axillary nodes examined negative breast@2900@000 S
35929268 No mass/tumor found breast@2800@000 S
35922738 None rectum@2910@000 S
35938569 "CONVERTED AND CODE REUSED V0203;Prior to V0203 code defined as ""Test ordered, results not in chart"". Cases converted to code 997 with V0203 and code 998 redefined as ""Test not done (test not ordered and not performed)"".;;Test not done (test not ordered a" net_small_intestine@2866@998 S
35934664 No residual tumor identified on specimen colon@2930@990 S
35925693 "Regional lymph node(s) involved, size not stated;Unknown if regional lymph node(s) involved;Not documented in patient record" kidney_parenchyma@2861@999 S
35937635 "Described as ""greater than 5 cm"";;Stated as T3 with no other information on tumor size" anus@2800@996 S
35925788 None colon@2910@000 S
35922350 Test not done (test not ordered and not performed) colon@2900@998 S
35934881 98.0 or greater U/ml pancreas_other@2880@980 S
35919704 Stated as 71-80% 3914@R80 S
cgreich commented 5 years ago

Well, there are flavors of null, where the information is not available, like 35933329 "No surgical specimen from primary site" or 35934159 "Test not done (test not ordered and not performed)". We don't need them. They add no information. Every day of my life that particular "Test" was not ordered or performed, thank God. Then we have the results of the TNM assessment, like 5940004 "Regional lymph node(s) involved, size not stated" or 35926018 "No regional lymph nodes involved". 35929615 "Pelvic lymph nodes surgically removed, but number of nodes unknown/not stated and not documented as a sampling or dissection" probably should be mapped to a surgery procedure, rather than to a value. And then there are those which are not flavors of null at all, like 35919025 "Stated as 91-100%" or 35937902 "98.0 or greater U/ml". We said we will take care of them later, and right now having them as Standard won't hurt.

No?

mgurley commented 5 years ago

Right, and 35933329 "No surgical specimen from primary site" and 35934159 "Test not done (test not ordered and not performed)" are currently standard concepts. So based on our existing intentions, they should not be standard. My point is that correcting these and the possibly many more could be a great curation burden. I am fine with sticking to our plan but if so, we likely have a lot of clean up to do.

cgreich commented 5 years ago

How much?

mgurley commented 5 years ago

@cgreich If I could automate it, it would not be a burden. I just noticed some that seemed wrong. Likely the full set will need to take a second pass curation.

mgurley commented 5 years ago

We decided that all flavors of unknown beyond Treatment concepts should be made non-standard. Treatment unknowns should be standard in the Observation domain.
Specific problems will need to be handled in separate issues.