OHDSI / DataQualityDashboard

A tool to help improve data quality standards in observational data science.
https://ohdsi.github.io/DataQualityDashboard
Apache License 2.0
135 stars 93 forks source link

Measurement unit and value pairs to look into #112

Closed clairblacketer closed 1 year ago

clairblacketer commented 4 years ago

There are some measurement unit and value pairs that don't seem to be supported by the data. This is similar to the work @vojtechhuser is doing but I want to make a list here so we know which ones to address.

  1. For the combination of CONCEPT_ID 3027597 (Bilirubin.direct [Mass/volume] in Serum or Plasma) and UNIT_CONCEPT_ID 8840 (milligram per deciliter), the number and percent of records that have a value less than 1.000. (Threshold=5%).
  2. For the combination of CONCEPT_ID 3019800 (Troponin T.cardiac [Mass/volume] in Serum or Plasma) and UNIT_CONCEPT_ID 8842 (nanogram per milliliter), the number and percent of records that have a value less than 1.000. (Threshold=5%).
  3. For the combination of CONCEPT_ID 3005105 (Blasts [#/volume] in Blood) and UNIT_CONCEPT_ID 8848 (thousand per microliter), the number and percent of records that have a value less than 1.000. (Threshold=5%).
  4. For the combination of CONCEPT_ID 3024929 (Platelets [#/volume] in Blood by Automated count) and UNIT_CONCEPT_ID 9444 (billion per liter), the number and percent of records that have a value higher than 100.0. (Threshold=5%).
  5. For the combination of CONCEPT_ID 3016723 (Creatinine [Mass/volume] in Serum or Plasma) and UNIT_CONCEPT_ID 8713 (gram per deciliter), the number and percent of records that have a value less than 2.500. (Threshold=5%).
  6. For the combination of CONCEPT_ID 3026925 (Triiodothyronine (T3) Free [Mass/volume] in Serum or Plasma) and UNIT_CONCEPT_ID 8820 (picogram per deciliter), the number and percent of records that have a value less than 600.000. (Threshold=5%).
  7. For the combination of CONCEPT_ID 3037672 (Protein.monoclonal/Protein.total in Urine by Electrophoresis) and UNIT_CONCEPT_ID 8554 (percent), the number and percent of records that have a value less than 1.000. (Threshold=5%).
  8. For the combination of CONCEPT_ID 3024128 (Bilirubin.total [Mass/volume] in Serum or Plasma) and UNIT_CONCEPT_ID 8840 (milligram per deciliter), the number and percent of records that have a value less than 1.000. (Threshold=5%).
  9. For the combination of CONCEPT_ID 3007359 (Bilirubin.indirect [Mass/volume] in Serum or Plasma) and UNIT_CONCEPT_ID 8840 (milligram per deciliter), the number and percent of records that have a value less than 1.000. (Threshold=5%).
  10. For the combination of CONCEPT_ID 3006140 (Bilirubin.total [Moles/volume] in Serum or Plasma) and UNIT_CONCEPT_ID 8749 (micromole per liter), the number and percent of records that have a value less than 1.000. (Threshold=5%).
  11. For the combination of CONCEPT_ID 3020399 (Glucose [Mass/volume] in Urine) and UNIT_CONCEPT_ID 8840 (milligram per deciliter), the number and percent of records that have a value less than 0.010. (Threshold=5%).
  12. For the combination of CONCEPT_ID 3049187 (Glomerular filtration rate/1.73 sq M predicted among non-blacks [Volume Rate/Area] in Serum, Plasma or Blood by Creatinine-based formula (MDRD)) and UNIT_CONCEPT_ID 8795 (milliliter per minute), the number and percent of records that have a value less than 10.000. (Threshold=5%).
  13. For the combination of CONCEPT_ID 3013429 (Basophils [#/volume] in Blood by Automated count) and UNIT_CONCEPT_ID 8961 (thousand per cubic millimeter), the number and percent of records that have a value less than 0.010. (Threshold=5%).
  14. For the combination of CONCEPT_ID 40771922 (Glomerular filtration rate/1.73 sq M.predicted [Volume Rate/Area] in Serum, Plasma or Blood) and UNIT_CONCEPT_ID 8795 (milliliter per minute), the number and percent of records that have a value less than 10.000. (Threshold=5%).

This is only a small number but the heaviest hitting in one of my US claims databases.

scossin commented 3 years ago

Hi, I would like to propose to add:

15.For the combination of CONCEPT_ID 3017766 (COMPLEMENT C4 [MASS/VOLUME] IN SERUM OR PLASMA) and UNIT_CONCEPT_ID 8636 (GRAM PER LITER), the number and percent of records that have a value less than 1.000. (Threshold=5%). Our normal range is between 0.15 and 0.35. On the Internet : Normal ranges for C4 is 15 to 45 milligrams per deciliter (mg/dL) (0.15 to 0.45 g/L)

I also opened a specific issue https://github.com/OHDSI/DataQualityDashboard/issues/238 for the combination of CONCEPT_ID 3015377 (CALCIUM [MOLES/VOLUME] IN SERUM OR PLASMA) and UNIT_CONCEPT_ID 8753 (MILLIMOLE PER LITER) : that could have been added to this list.

Thanks

clairblacketer commented 2 years ago

This is a good issue for a newcomer, it will require working with someone who has data to get an initial understanding of the measurement values but the rest can be done asynchronously

clairblacketer commented 1 year ago

I am closing this as we implemented new measurement checks looking for plausible units and we removed the plausible value checks.