Open MaximMoinat opened 1 month ago
That said, after thinking this through for a bit, the issue is actually that the unit concept is just not properly populated. The following table covers all options:
Value β | Value β | |
---|---|---|
Unit β | A π | B π« |
Unit 0οΈβ£ | C βΌοΈ | D π« |
Unit β | E π« | F π |
β - non-null β - null π - expected π« - unexpected, should not happen
B and D should not happen (unit while there is no value). What we currently correct for, is D; default unit_concept_id to 0 even if the measurement/observation has no value. We might want to implement a separate check for these cases.
What we are currently doing: $\frac{A}{A+C+E}$
What we should do: $\frac{A}{A+C}$
And then we can make this consistent for all non-required _concept_id
fields. i.e. if not required, do not count empty _concept_id
fields in numerator.
Thanks so much for the thorough summary @MaximMoinat ! I agree with your assessment. I think we might also want to consider adding a separate check for cases B, D, and E. I observe inconsistencies in these fields often and it's confusing. We should be enforcing that non-required concept fields are left NULL when there is no data with which to populate a concept. This way users can distinguish no data from unmappable data.
This issue came up while reviewing the documentation of sourceConceptRecordCompleteness check.
In the query for both standard and source concept id, we have this exception for unit concepts: https://github.com/OHDSI/DataQualityDashboard/blob/6ef7ee2dd1116741e3fe9907ef4d9cc98eccb96c/inst/sql/sql_server/field_concept_record_completeness.sql#L37
The reason we have this rule is that many databases will enter a 0 as the
unit_concept_id
by default, even if there is no measurement/observation value. These should be ignored to get a meaningful violating percentage. (fyi: the correct way to do this is to leave it empty, NULL, as the unit concept is not a required field).However, the rule is inconsistent; we don't apply the exception to the
unit_source_concept_id
and not to the device and specimen table which also have aunit_concept_id
andunit_source_concept_id
. For these tables, we can assume it refers to thequantity
field (instead of value_as_number).We need to update the query to be consistent. It becomes a bit messy, but this would be the new where-clause.