OHDSI / Themis

Repository for OMOP CDM conventions as defined by THEMIS. These can be reference lists of concepts, pieces of standardized code for data generation or quality certification, and debates.
Apache License 2.0
27 stars 8 forks source link

Count of deleted persons in METADATA #46

Open vojtechhuser opened 5 years ago

vojtechhuser commented 5 years ago
TYPE NOTES
ITEM PERSON Conversion Number 15 states that it is okay to drop persons when they are not of high quality. How does one track the number of person that were dropped between the raw data and the CDM?
FORUM POST http://forums.ohdsi.org/t/metadata-and-annotations-wg/4242/32 (not specific to this topic - general thread)
SOLUTION We should track this information of person loss in the METADATA table and check that this was done via ACHILLES HEEL.
NEXT STEPS Add convention to METADATA:

It is encouraged that when a CDM ETL deletes persons from the data for one reason or another that that information be tracked in the METADATA table. For example, if an ETL deletes persons when they are missing data, then the METADATA table should capture the count of persons deleted for this specific rule. If no persons are deleted between the raw and CDM this should also be captured in the METADATA table.

Add ACHILLES HEEL rule that checks patient loss is documented in the METADATA table.

Asking @alondhe if this is a best way to document this.
vojtechhuser commented 5 years ago

it is as convention #15 in person but not in METADATA

related issue is here https://github.com/OHDSI/Themis/issues/9

MelaniePhilofsky commented 5 years ago

@vojtechhuser There are many reasons a person may be deleted from the CDM. How do we capture all the reasons in a standardized format within the METADATA table?

vojtechhuser commented 5 years ago

I see two sides. Documenting the size of the deletion and documenting the reasons.

If count of deleted = 0 - that is a good fact to know. It make me trust the data more. So this issue is JUST about the count. not the reason.

MelaniePhilofsky commented 5 years ago

Ok, do you have plans to add in reason for deletion? Just curious

From my perspective: If I see a count of deleted = 0 for EHR data, it makes me trust the data less. Every EHR dataset I have seen has impossible data. Persons with birth year in the 1860's, birth dates after death dates, etc.

MelaniePhilofsky commented 1 year ago

@vojtechhuser Do you you want to sponsor this issue? Or do you want to close this issue?