OHDSI / OncologyWG

Oncology Working Group Repository
https://ohdsi.github.io/OncologyWG
Apache License 2.0
56 stars 24 forks source link

How should we handle 'Numeric' NAACCR items that define numeric ranges? #29

Closed mgurley closed 5 years ago

mgurley commented 5 years ago

For example NAACCR #752 'Tumor Size Clinical', has the following list of NAACCR item codes

  000     No mass/tumor found
  001     1 mm or described as less than 1 mm
  002-988 Exact size in millimeters (2 mm to 988 mm)
  989     989 millimeters or larger
  990     Microscopic focus or foci only and no size of focus is given
  999     Unknown
          Size not stated
          Not documented in patient record
          Size of tumor cannot be assessed
          Not applicable

Rimma's proposed solution: The NAACCR item should contain no list of possible values in the 'Meas Value' domain and should be recorded in Measurement.value_as_number as it appears in the source.

rtmill commented 5 years ago

Using the above example, two different approaches come to mind:

Option 1) Specify via relationship_id in concept_relationship:

[CONCEPT]

concept_id concept_code concept_name concept_class_id
1 x-752 Tumor Size Clinical NAACCR item
2 x-752-001 1mm or described as less than 1mm NAACCR answer
3 x-752-002-988 Exact size in millimeters (2mm to 988mm) NAACCR answer

[CONCEPT_RELATIONSHIP]

concept_id_1 concept_id_2 relationship_id
1 2 Has answer
1 3 Has numeric range answer

Option 2) Specify via concept_class_id of 'Answer' concepts

[CONCEPT]

concept_id concept_code concept_name concept_class_id
1 x-752 Tumor Size Clinical NAACCR item
2 x-752-001 1mm or described as less than 1mm NAACCR answer
3 x-752-002-988 Exact size in millimeters (2mm to 988mm) NAACCR numeric range answer

[CONCEPT_RELATIONSHIP]

concept_id_1 concept_id_2 relationship_id
1 2 Has answer
1 3 Has answer
cgreich commented 5 years ago

I think Rimma's solution is the right one. And we need to map to one single concept_id='Tumor size'

000 No mass/tumor found 001 1 mm or described as less than 1 mm - operator='<' , value=1, unit_concept_id=mm 002-988 Exact size in millimeters (2 mm to 988 mm) - operator=null , value=size, unit_concept_id=mm 989 989 millimeters or larger - operator='>' , value=989, unit_concept_id=mm (what tumor is almost a meter?) 990 Microscopic focus or foci only and no size of focus is given =0 999 Unknown =0 Size not stated =0 Not documented in patient record =0 Size of tumor cannot be assessed =0 Not applicable =0

mgurley commented 5 years ago

@dimshitc With @cgreich agreement with @rimusia proposal's recommendation, we will go with it. Maybe we can incporporate @rtmill metadata suggestions for future version.

rtmill commented 5 years ago

@mgurley

I must have misunderstood what you meant by 'handle' in the issue description. There seem to be two separate numeric range issues here: 1) how we map from source (naaccr) concepts to standard concepts - what @cgreich refers to and @rimusia 's proposal solves. 2) From an ETL perspective, how can we consistently map from naaccr numeric values to source (naaccr) concepts - what I was referring to.

Using the above example, say we have a tumor size measurement with a value of '57'. The concept_code 'x-752-057' wont exist in the vocabulary. That record would somehow need to map to the concept for 'x-752-002-988'

mgurley commented 5 years ago

@rtmill Our assumption is that 'x-752-002-988 would not exist in the CONCEPT table either. The only problem I see with Rimma's suggestion is that we don't want to put '990' or '999' in the MEASUREMENT.value_as_number field because those do not represent 'Tumor Size Clinical' but a flavor of unknown. So I am reopening this issue.

mgurley commented 5 years ago

@rtmill Or maybe 'x-752-990' and 'x-752-999' could exist but be non-standard and mapped to concept_id=0

rtmill commented 5 years ago

I'm having trouble picturing this in an ETL. In this scenario, with the NULL flavors excluded as you propose above, would the remainder of the records map to the root item concept code? (i.e. x-752). If we aren't mapping to the root item code, how would we join on the concept code for discrete values? If we are mapping to the root item code, how can we qualify the operator for the records that aren't discrete (i.e. 001 "<= 1" and 989 ">=989") without needing switch/case statements for every item-value combination? Concept relationships of type 'has operator'?

This scenario gets complicated in a hurry e.g. image

mgurley commented 5 years ago

@rtmill Look at @rimusia proposal in the 'Appendix'. Her SQL should work for the 'Age at Diagnosis' "Numeric" NAACCR #230 field. Except for the case of 999 Unknown age, it should work. Let us know if it does not. But we don't want to accidentally let in 999, so we will need to address it.

rimusia commented 5 years ago
  1. Handling "unknown" flavors I suggested to map all the "unknown" versions to concept_id = 0. We must include all the predefined permissible values for each variable we include. Otherwise, we will conflate a permissible value with a numeric value (see #2 below). I will add this to the proposal along with a generic SQL code that will map ALL NAACCR concepts to standard.
  2. Handling of numeric and other non-concept values For NAACCR variables that may contain either pre-defined permissible codes or actual values, an absence of a permissible value in the vocabulary will indicate that the value should be stored as in in the measurement.value_as_number field. Therefore, all the true permissible codes must be included in the vocabulary.
rtmill commented 5 years ago

@mgurley @rimusia thanks for the clarification, it's making a lot more sense. As I understand it:

measurement_concept_id <- (item_concept)
value_as_concept_id <-(value_concept)
value_as_number <- if(value_concept IS NULL) THEN numeric_value ELSE NULL
...
WHERE value_concept <> 0

Last two outstanding questions: 1 )

For NAACCR variables that may contain either pre-defined permissible codes or actual values

Are we planning to specify/flag items for this or are we going to assume that all variables fall into this category and ETL them similarly?

2 ) For values that contain operators e.g.

001 1 mm or described as less than 1 mm - operator='<' , value=1, unit_concept_id=mm

...how are we associating the operator_concept_id and unit_concept_id? Concept relationships?

rimusia commented 5 years ago

@rtmill 1) I don't think we should flag these items 2) I have a proposal for representing those concepts that may have numeric/range values and also units. (LOINC includes concepts that have numeric values and units). The idea is to add attributes like "has value" or "has units" to a concept. I am already doing it at MSK. In OMOP, it can be accomplished via relationships. However, we don't have (yet) space for relationships with numeric data. Another CDM extension, @cgreich ?

cgreich commented 5 years ago

@dimshitc actually brought it up in a different circumstance. Yes, we should.

mgurley commented 5 years ago

At the 3/12/2019 Oncology Workgroup meeting we discussed various options. @rimusia volunteered to draft a final recommendation.

mgurley commented 5 years ago

Submitted on behalf of @dimshitc

There's a bunch of results with pretty big number of possible numeric values: 0.1 - 99,999.9 U/L 0.1 - 99,999.9 ng/mL 0.1 - 99,999.9 mIU/mL So, encoding only these 3 we end up with about 310^6 new concepts. As we have about 610^6 valid concepts, this addition may affect overal performance.

So, I have a proposal how to avoid adding millions of new concepts. Let's use the example from Rimma's proposal NAACCR item #2800, CS Tumor Size

Code Description
001-988 001 - 988 millimeters (mm) (Exact size to nearest mm)
989 989 mm or larger
991 Described as "less than 1 centimeter (cm)"
992 Described as "less than 2 cm," or "greater than 1 cm," or "between 1 cm and

These

Code Description
989 989 mm or larger
991 Described as "less than 1 centimeter (cm)"
992 Described as "less than 2 cm," or "greater than 1 cm," or "between 1 cm and 2 cm"

go to concept and concept_numeric exactly as Rimma proposed.

This

Code Description
001-988 001 - 988 millimeters (mm) (Exact size to nearest mm)

doesn't go to vocabulary at all.
This way if the value of NAACCR item #2800 doesn't match to any of #2800's values and it's numeric (satisfies float datatype criteria), then it should be treated as value_as_number.

And also we have

concept_id_1 relationship_id concept_id_2
XXXXXX(‘CS Tumor Size’) has type YYYYYYY(‘Numeric’)
XXXXXX(‘CS Tumor Size’) has units ZZZZZZZ(‘mm’)
3846@XXXXX.8 Not applicable: Information not collected for this case\ (If this information is required by your standard setter, use of code XXXXX.8 may result in an edit error.)
3846@XXXXX.9 Not documented in medical record|No orchiectomy performed|hCG (Human Chorionic Gonadotropin) Post-Orchiectomy Lab Value not assessed or unknown if assessed
3846@XXXXX.7 Test ordered, results not in chart
3846@0.1-99999.9 0.1 - 99,999.9 mIU/mL
3846@XXXXX.1 100,000 mIU/mL or greater
3846@0.0 0.0 milli-International Units/milliliter (mIU/mL)
mgurley commented 5 years ago

@rimusia Can you please update the Documentation with the final decision. See #40 for issue to get the decision approved.