Closed mgurley closed 5 years ago
Using the above example, two different approaches come to mind:
Option 1) Specify via relationship_id in concept_relationship:
[CONCEPT]
concept_id | concept_code | concept_name | concept_class_id |
---|---|---|---|
1 | x-752 | Tumor Size Clinical | NAACCR item |
2 | x-752-001 | 1mm or described as less than 1mm | NAACCR answer |
3 | x-752-002-988 | Exact size in millimeters (2mm to 988mm) | NAACCR answer |
[CONCEPT_RELATIONSHIP]
concept_id_1 | concept_id_2 | relationship_id |
---|---|---|
1 | 2 | Has answer |
1 | 3 | Has numeric range answer |
Option 2) Specify via concept_class_id of 'Answer' concepts
[CONCEPT]
concept_id | concept_code | concept_name | concept_class_id |
---|---|---|---|
1 | x-752 | Tumor Size Clinical | NAACCR item |
2 | x-752-001 | 1mm or described as less than 1mm | NAACCR answer |
3 | x-752-002-988 | Exact size in millimeters (2mm to 988mm) | NAACCR numeric range answer |
[CONCEPT_RELATIONSHIP]
concept_id_1 | concept_id_2 | relationship_id |
---|---|---|
1 | 2 | Has answer |
1 | 3 | Has answer |
I think Rimma's solution is the right one. And we need to map to one single concept_id='Tumor size'
000 No mass/tumor found 001 1 mm or described as less than 1 mm - operator='<' , value=1, unit_concept_id=mm 002-988 Exact size in millimeters (2 mm to 988 mm) - operator=null , value=size, unit_concept_id=mm 989 989 millimeters or larger - operator='>' , value=989, unit_concept_id=mm (what tumor is almost a meter?) 990 Microscopic focus or foci only and no size of focus is given =0 999 Unknown =0 Size not stated =0 Not documented in patient record =0 Size of tumor cannot be assessed =0 Not applicable =0
@dimshitc With @cgreich agreement with @rimusia proposal's recommendation, we will go with it. Maybe we can incporporate @rtmill metadata suggestions for future version.
@mgurley
I must have misunderstood what you meant by 'handle' in the issue description. There seem to be two separate numeric range issues here: 1) how we map from source (naaccr) concepts to standard concepts - what @cgreich refers to and @rimusia 's proposal solves. 2) From an ETL perspective, how can we consistently map from naaccr numeric values to source (naaccr) concepts - what I was referring to.
Using the above example, say we have a tumor size measurement with a value of '57'. The concept_code 'x-752-057' wont exist in the vocabulary. That record would somehow need to map to the concept for 'x-752-002-988'
@rtmill Our assumption is that 'x-752-002-988 would not exist in the CONCEPT table either. The only problem I see with Rimma's suggestion is that we don't want to put '990' or '999' in the MEASUREMENT.value_as_number field because those do not represent 'Tumor Size Clinical' but a flavor of unknown. So I am reopening this issue.
@rtmill Or maybe 'x-752-990' and 'x-752-999' could exist but be non-standard and mapped to concept_id=0
I'm having trouble picturing this in an ETL. In this scenario, with the NULL flavors excluded as you propose above, would the remainder of the records map to the root item concept code? (i.e. x-752). If we aren't mapping to the root item code, how would we join on the concept code for discrete values? If we are mapping to the root item code, how can we qualify the operator for the records that aren't discrete (i.e. 001 "<= 1" and 989 ">=989") without needing switch/case statements for every item-value combination? Concept relationships of type 'has operator'?
This scenario gets complicated in a hurry e.g.
@rtmill Look at @rimusia proposal in the 'Appendix'. Her SQL should work for the 'Age at Diagnosis' "Numeric" NAACCR #230 field. Except for the case of 999 Unknown age, it should work. Let us know if it does not. But we don't want to accidentally let in 999, so we will need to address it.
@mgurley @rimusia thanks for the clarification, it's making a lot more sense. As I understand it:
measurement_concept_id <- (item_concept)
value_as_concept_id <-(value_concept)
value_as_number <- if(value_concept IS NULL) THEN numeric_value ELSE NULL
...
WHERE value_concept <> 0
Last two outstanding questions: 1 )
For NAACCR variables that may contain either pre-defined permissible codes or actual values
Are we planning to specify/flag items for this or are we going to assume that all variables fall into this category and ETL them similarly?
2 ) For values that contain operators e.g.
001 1 mm or described as less than 1 mm - operator='<' , value=1, unit_concept_id=mm
...how are we associating the operator_concept_id and unit_concept_id? Concept relationships?
@rtmill 1) I don't think we should flag these items 2) I have a proposal for representing those concepts that may have numeric/range values and also units. (LOINC includes concepts that have numeric values and units). The idea is to add attributes like "has value" or "has units" to a concept. I am already doing it at MSK. In OMOP, it can be accomplished via relationships. However, we don't have (yet) space for relationships with numeric data. Another CDM extension, @cgreich ?
@dimshitc actually brought it up in a different circumstance. Yes, we should.
At the 3/12/2019 Oncology Workgroup meeting we discussed various options. @rimusia volunteered to draft a final recommendation.
Submitted on behalf of @dimshitc
There's a bunch of results with pretty big number of possible numeric values: 0.1 - 99,999.9 U/L 0.1 - 99,999.9 ng/mL 0.1 - 99,999.9 mIU/mL So, encoding only these 3 we end up with about 310^6 new concepts. As we have about 610^6 valid concepts, this addition may affect overal performance.
So, I have a proposal how to avoid adding millions of new concepts. Let's use the example from Rimma's proposal NAACCR item #2800, CS Tumor Size
Code | Description |
---|---|
001-988 | 001 - 988 millimeters (mm) (Exact size to nearest mm) |
989 | 989 mm or larger |
991 | Described as "less than 1 centimeter (cm)" |
992 | Described as "less than 2 cm," or "greater than 1 cm," or "between 1 cm and |
These
Code | Description |
---|---|
989 | 989 mm or larger |
991 | Described as "less than 1 centimeter (cm)" |
992 | Described as "less than 2 cm," or "greater than 1 cm," or "between 1 cm and 2 cm" |
go to concept and concept_numeric exactly as Rimma proposed.
This
Code | Description |
---|---|
001-988 | 001 - 988 millimeters (mm) (Exact size to nearest mm) |
doesn't go to vocabulary at all.
This way if the value of NAACCR item #2800 doesn't match to any of #2800's values and it's numeric (satisfies float datatype criteria), then it should be treated as value_as_number.
And also we have
concept_id_1 | relationship_id | concept_id_2 |
---|---|---|
XXXXXX(‘CS Tumor Size’) | has type | YYYYYYY(‘Numeric’) |
XXXXXX(‘CS Tumor Size’) | has units | ZZZZZZZ(‘mm’) |
3846@XXXXX.8 | Not applicable: Information not collected for this case\ | (If this information is required by your standard setter, use of code XXXXX.8 may result in an edit error.) |
---|---|---|
3846@XXXXX.9 | Not documented in medical record|No orchiectomy performed|hCG (Human Chorionic Gonadotropin) Post-Orchiectomy Lab Value not assessed or unknown if assessed | |
3846@XXXXX.7 | Test ordered, results not in chart | |
3846@0.1-99999.9 | 0.1 - 99,999.9 mIU/mL | |
3846@XXXXX.1 | 100,000 mIU/mL or greater | |
3846@0.0 | 0.0 milli-International Units/milliliter (mIU/mL) |
@rimusia Can you please update the Documentation with the final decision. See #40 for issue to get the decision approved.
For example NAACCR #752 'Tumor Size Clinical', has the following list of NAACCR item codes
Rimma's proposed solution: The NAACCR item should contain no list of possible values in the 'Meas Value' domain and should be recorded in Measurement.value_as_number as it appears in the source.