OHDSI / DataQualityDashboard

A tool to help improve data quality standards in observational data science.
https://ohdsi.github.io/DataQualityDashboard
Apache License 2.0
134 stars 93 forks source link

Make consistent unit concept exception in Concept Record Completeness #558

Open MaximMoinat opened 1 month ago

MaximMoinat commented 1 month ago

This issue came up while reviewing the documentation of sourceConceptRecordCompleteness check.

In the query for both standard and source concept id, we have this exception for unit concepts: https://github.com/OHDSI/DataQualityDashboard/blob/6ef7ee2dd1116741e3fe9907ef4d9cc98eccb96c/inst/sql/sql_server/field_concept_record_completeness.sql#L37

The reason we have this rule is that many databases will enter a 0 as the unit_concept_id by default, even if there is no measurement/observation value. These should be ignored to get a meaningful violating percentage. (fyi: the correct way to do this is to leave it empty, NULL, as the unit concept is not a required field).

However, the rule is inconsistent; we don't apply the exception to the unit_source_concept_id and not to the device and specimen table which also have a unit_concept_id and unit_source_concept_id. For these tables, we can assume it refers to the quantity field (instead of value_as_number).

We need to update the query to be consistent. It becomes a bit messy, but this would be the new where-clause.

WHERE cdmTable.@cdmFieldName = 0 
{@cdmTableName IN ('MEASUREMENT', 'OBSERVATION') & @cdmFieldName IN ('UNIT_CONCEPT_ID', 'UNIT_SOURCE_CONCEPT_ID')}?{AND cdmTable.value_as_number IS NOT NULL}
{@cdmTableName IN ('DEVICE_EXPOSURE', 'SPECIMEN') & @cdmFieldName IN ('UNIT_CONCEPT_ID', 'UNIT_SOURCE_CONCEPT_ID')}?{AND cdmTable.quantity IS NOT NULL}
MaximMoinat commented 1 month ago

That said, after thinking this through for a bit, the issue is actually that the unit concept is just not properly populated. The following table covers all options:

Value βœ… Value ❌
Unit βœ… A πŸ‘ B 🚫
Unit 0️⃣ C ‼️ D 🚫
Unit ❌ E 🚫 F πŸ‘

βœ… - non-null ❌ - null πŸ‘ - expected 🚫 - unexpected, should not happen

B and D should not happen (unit while there is no value). What we currently correct for, is D; default unit_concept_id to 0 even if the measurement/observation has no value. We might want to implement a separate check for these cases.

What we are currently doing: $\frac{A}{A+C+E}$

What we should do: $\frac{A}{A+C}$

And then we can make this consistent for all non-required _concept_id fields. i.e. if not required, do not count empty _concept_id fields in numerator.

katy-sadowski commented 1 month ago

Thanks so much for the thorough summary @MaximMoinat ! I agree with your assessment. I think we might also want to consider adding a separate check for cases B, D, and E. I observe inconsistencies in these fields often and it's confusing. We should be enforcing that non-required concept fields are left NULL when there is no data with which to populate a concept. This way users can distinguish no data from unmappable data.