OHDSI / Achilles

Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) - descriptive statistics about a OMOP CDM database
https://ohdsi.github.io/Achilles/
130 stars 122 forks source link

Incorporate custom PEDSnet DQA checks and code? #122

Closed gracebrownecodes closed 3 years ago

gracebrownecodes commented 8 years ago

This ticket is really just to connect two people who are both actively working on data quality packages. @vojtechhuser can you restate your interest in the PEDSnet checks for @writetoritu?

vojtechhuser commented 8 years ago

I have made few additions to Achilles Heel recently. I am developing some EHR-data centric DQ rules. (as opposed to Claims). I am limited, however, by only CDM v4 GE Centricity data to go by.

vojtechhuser commented 8 years ago

I got email reply with Ritu Khare. (not with any specific code ideas) I will wait for her to get back to me with more details - if we want to proceed with this.

vojtechhuser commented 7 years ago

from their JAMIA paper

APPENDIX A. A CONCEPTUAL SCHEMA FOR DATA QUALITY IN PEDSNET This appendix is created to supplement the entity relation diagram illustrated in Figure 1 of the manuscript.

A.1 Entity Definitions

Site The PEDSnet network is a collaboration of eight of the nation’s largest children’s hospitals; each hospital system is referred to as a partner site in the context of the aggregated dataset.

OMOP Domain PEDSnet has extended the OMOP CDM, referred to as the PEDSnet CDM, as the common denominator to aggregate EHR data from various partner sites. The PEDSnet CDM consists of various clinical domains, also known as OMOP domains or tables, such as person, visit_occurrence and, condition_occurrence. Field Each OMOP domain represents data about a certain clinical entity and consists of various fields that capture discrete observational values. Some example fields from the person domain include race_concept_id, year_of_birth, gender_source_value, etc. Data Cycle The PEDSnet network is being developed in iterations, known as data cycles. During each data cycle, a partner site conducts the ETL operations on their source EHR to prepare an instance of the PEDSnet CDM, extracting and transforming data for various OMOP domains and fields, according to PEDSnet ETL conventions. The sites submit these cycle-specific PEDSnet databases (i.e. with a certain “data version”) to the PEDSnet DCC. Check Type In PEDSnet, a check type is a category of computational tests or assessments to be performed on the dataset in order to assess the data quality. Table 1 shows an inventory of check types that could be executed by the DCC on the datasets submitted by the partner sites.
Data Quality Check A check is a specialized version of a check type that is applied to or designed for a specific field of any given OMOP domain. The threshold attributes are associated with the checks that return numerical values. If the returned value is outside the limits of thresholds, then a “data quality issue” (described next) is created. Data Quality Issue A data quality issue is the conceptual result of executing a check on a site’s dataset. The data quality workflow returns the description of the issue, priority, and the GitHub link for the issue. The DCC manually updates the status and cause of the issue as the data cycle progresses. A.2 The SQL Data Definition Language statements (PostgreSQL) CREATE SCHEMA dqa;

CREATE TABLE dqa.site ( site_id serial, site_name varchar(20), PRIMARY KEY (site_id) );

CREATE TABLE dqa.omop_domain ( domain_id serial, domain_name varchar(40), PRIMARY KEY (domain_id) );

CREATE TABLE dqa.field ( field_id serial, name varchar(40), datatype varchar(20), domain_id integer, PRIMARY KEY (field_id), FOREIGN KEY (domain_id) REFERENCES dqa.omop_domain(domain_id) );

CREATE TABLE dqa.data_cycle ( cycle_id serial, cycle_date date, etl_conv_version varchar(20), PRIMARY KEY (cycle_id) );

CREATE TABLE dqa.check_type ( check_type_id serial, name varchar(20), alias varchar(20), PRIMARY KEY (check_type_id) );

CREATE TABLE dqa.check ( check_id serial, check_type_id integer, PRIMARY KEY (check_id), FOREIGN KEY (check_type_id) REFERENCES dqa.check_type (check_type_id) );

CREATE TABLE dqa.design ( check_id integer, field_id integer, low_threshold numeric, upp_threshold numeric, PRIMARY KEY (check_id, field_id), FOREIGN KEY (check_id) REFERENCES dqa.check(check_id), FOREIGN KEY (field_id) REFERENCES dqa.field(field_id) );

CREATE TABLE dqa.issue_observed( issue_observed_id serial, cycle_id integer, site_id integer, data_version varchar(20), PRIMARY KEY(issue_observed_id), FOREIGN KEY(cycle_id) REFERENCES dqa.data_cycle(cycle_id), FOREIGN KEY(site_id) REFERENCES dqa.site(site_id) );

CREATE TABLE dqa.issue ( issue_id serial,
check_id integer,
description varchar(100), status varchar(30),
cause varchar (60),
priority varchar(20),
github_issue_link varchar(50),
issue_observed_id integer, PRIMARY KEY (issue_id, issue_observed_id), FOREIGN KEY (issue_observed_id) REFERENCES dqa.issue_observed(issue_observed_id), FOREIGN KEY (check_id) REFERENCES dqa.check(check_id), CHECK (cause IN ( 'ETL:programming error',
'ETL: unclear conventions',
'ETL: new conventions required',
'ETL: administrative',
'Provenance: missing in source',
'Provenance: entry error or convention',
'Provenance: site-specific ETL convention', 'Provenance: new clinical workflow',
'Provenance: true anomaly',
'Provenance: EHR configuration',
'Provenance: administrative workflow',
'Provenance: vocabulary',
'Provenance: not applicable',
'Non-issue: DQA workflow bug',
'Non-issue: improvement in previous ETL' )), CHECK (status IN ( 'new', 'under review', 'solution proposed', 'persistent', 'withdrawn' )), CHECK (priority IN ('High','Medium','Low')));

vojtechhuser commented 6 years ago

It would be nice to be able to "mute" an issue found in Achilles.

writetoritu commented 6 years ago

@vojtechhuser sounds like a good idea! What logic would you be using to "mute" the issues?

vojtechhuser commented 6 years ago

I would be inspired by PEDSNet and introduce new Achilles tables that would support 'data cycle', making issues from current messages and commenting on any current or past issue.

So no automated logic for muting - more like human input.

fdefalco commented 3 years ago

Heel has been superseded by DQD and is no longer under development.