OHDSI / WhiteRabbit

WhiteRabbit is a small application that can be used to analyse the structure and contents of a database as preparation for designing an ETL. It comes with RabbitInAHat, an application for interactive design of an ETL to the OMOP Common Data Model with the help of the the scan report generated by White Rabbit.
http://ohdsi.github.io/WhiteRabbit
Apache License 2.0
178 stars 88 forks source link

White Rabbit: Add thresholds to compare to Field empty in Scan Report #357

Open megan-lien opened 1 year ago

megan-lien commented 1 year ago

The Scan Report Field Overview tab includes a Fraction empty field to show the percent populated for each field in each table scanned. It would be helpful if we could also specify thresholds for each field, or particular fields, in order to quickly verify the data is populated as expected. For example: we expect a person_id field to be populated 100% of the time in the person table. A Fraction empty value >0.0% should signal a failure.

Proposed Solution: Adding a threshold table to the tables to scan to compare, specifying acceptable thresholds for Fraction empty Ex: table,field,min_threshold,max_threshold person,person_id,0,0 visit_occurrence,visit_end_datetime,0,5 drug_exposure,stop_reason,95,100

Add a Pass/Fail column in the Field Overview tab based on the threshold file: Table, Field, Description, Type, ... , Fraction empty, Threshold check, person, person_id,,INT, ..., 0.0%, PASS visit_occurrence,visit_end_datetime, , DATE, ..., 7.0%, FAIL

Currently we can check this manually by reviewing the scan report, but the analyst must be familiar with the data model and the requirements. For data that is ingested regularly, those requirements won't change and it would be easier to call out data issues using thresholds.