MIT-LCP / mimic-omop

Mapping the MIMIC-III database to the OMOP schema
MIT License
123 stars 47 forks source link

How to resolve the error in ETL? #64

Open YiKuanLiu opened 4 years ago

YiKuanLiu commented 4 years ago

I'm trying to build up the mimic-omop, and when I finally went to the last step, the check result is not good, as below:

psql:etl/check_etl.sql:1: unrecognized value "" for "ECHO" Available values are: none, errors, queries, all. Output format is unaligned. Pager usage is off. psql:etl/check_etl.sql:15: etl/StandardizedVocabularies/CONCEPT/check_etl.sql: No such file or directory psql:etl/check_etl.sql:16: etl/StandardizedVocabularies/COHORT_DEFINITION/check_etl.sql: No such file or directory psql:etl/check_etl.sql:17: etl/StandardizedVocabularies/ATTRIBUTE_DEFINITION/check_etl.sql: No such file or directory psql:etl/check_etl.sql:18: etl/StandardizedDerivedElements/COHORT_ATTRIBUTE/check_etl.sql: No such file or directory psql:etl/check_etl.sql:19: etl/StandardizedHealthSystemDataTables/CARE_SITE/check_etl.sql: No such file or directory 1..4 ok 1 - PERSON -- number patients match ok 2 - PERSON -- gender distribution matches ok 3 - PERSON -- date of birth year distributions match ok 4 - PERSON -- no births after deaths 1..1 not ok 1 - number of unique patients who die in the database Failed test 1: "number of unique patients who die in the database" Results differ beginning at row 1: have: (15759) want: (14849) Looks like you failed 1 test of 1 1..7 ok 1 - Visit_occurrence table -- same number admission ok 2 - number of hospital admissions who die in-hospital match ok 3 - Visit_occurrence table -- same distribution adm ok 4 - Visit_occurrence table -- distribution admit source value ok 5 - Visit_occurrence table -- repartition discharge_to_source_value ok 6 - Visit_occurrence table -- links checker ok 7 - Visit_occurrence table -- start_date > end_date 1..1 ok 1 - OBSERVATION_PERIOD -- row count matches visit_occurrence 1..7 not ok 1 - VISIT_DETAIL -- test same number transfers Failed test 1: "VISIT_DETAIL -- test same number transfers" Results differ beginning at row 1: have: NULL want: (184142,19) ok 2 - VISIT_DETAIL -- test visit_source_value and visit_source_concept_id match not ok 3 - VISIT_DETAIL -- test admitting_source_concept_id and admitting_source_value match Failed test 3: "VISIT_DETAIL -- test admitting_source_concept_id and admitting_source_value match" Results differ beginning at row 1: have: (0) want: (139516) not ok 4 - VISIT_DETAIL -- test patients number in visit_detail/icustays Failed test 4: "VISIT_DETAIL -- test patients number in visit_detail/icustays" Results differ beginning at row 1: have: (0,0) want: (46476,57786) not ok 5 - VISIT_DETAIL -- check start_datetime < end_datetime Failed test 5: "VISIT_DETAIL -- check start_datetime < end_datetime" Number of columns or their types differ between the queries: have: (161) want: (0) not ok 6 - VISIT_DETAIL -- check start_date < end_date Failed test 6: "VISIT_DETAIL -- check start_date < end_date" Number of columns or their types differ between the queries: have: (35) want: (0) ok 7 - VISIT_DETAIL -- check care site is never null Looks like you failed 5 tests of 7 1..7 not ok 1 - MEASUREMENT -- check distribution of all labs match Failed test 1: "MEASUREMENT -- check distribution of all labs match" Results differ beginning at row 1: have: (22014,-22,472,794) want: NULL psql:etl/StandardizedClinicalDataTables/MEASUREMENT/check_etl.sql:67: ERROR: syntax error at or near "AS" LINE 7: AS org_name ^ QUERY:
SELECT org_name::TEXT, count(1) FROM ( SELECT DISTINCT ON (hadm_id, spec_type_desc, org_name, coalesce(charttime, chartdate)) AS org_name FROM microbiologyevents WHERE org_name IS NOT NULL ) tmp GROUP BY org_name ORDER BY 2, 1 desc;

CONTEXT: PL/pgSQL function results_eq(text,text,text) line 7 at OPEN ok 2 - MEASUREMENT -- there is source concept in measurement not described ok 3 - MEASUREMENT -- check for duplicate primary keys ok 4 - MEASUREMENT -- standard concept checker not ok 5 - MEASUREMENT -- check row counts match Failed test 5: "MEASUREMENT -- check row counts match" Results differ beginning at row 1: have: NULL want: (1,6330) ok 6 - Measurement pass, w00t! Looks like you planned 7 tests but ran 6 1..5 ok 1 - PROCEDURE_OCCURRENCE -- check all procedureevents_mv rows inserted ok 2 - PROCEDURE_OCCURRENCE -- check label is consistent with source_value ok 3 - PROCEDURE_OCCURRENCE -- check all CPT code rows inserted ok 4 - PROCEDURE_OCCURRENCE -- check CPT subsections mapped correctly ok 5 - PROCEDURE_OCCURRENCE -- check ICD procedure rows inserted 1..1 ok 1 - PROVIDER -- check caregivers/providers match 1..8 ok 1 - CONDITION OCCURRENCE -- check ICD diagnoses row count matches ok 2 - CONDITION OCCURRENCE -- diagnosis in admission same ok 3 - CONDITION OCCURRENCE -- distrib diagnosis the same ok 4 - CONDITION OCCURRENCE -- there is source concept in measurement not described ok 5 - CONDITION OCCURRENCE -- primary key checker ok 6 - CONDITION OCCURRENCE -- standard concept checker not ok 7 - CONDITION OCCURRENCE -- start_datetime should be > end_datetime Failed test 7: "CONDITION OCCURRENCE -- start_datetime should be > end_datetime" Results differ beginning at row 1: have: (585) want: (0) ok 8 - CONDITION OCCURRENCE -- start_date should be > end_date Looks like you failed 1 test of 8 1..10 not ok 1 - OBSERVATION -- religion distribution matches (concept 4052017) Failed test 1: "OBSERVATION -- religion distribution matches (concept 4052017)" Results differ beginning at row 1: have: NULL want: (LUTHERAN,1) not ok 2 - OBSERVATION -- language distribution matches (concept 40758030) Failed test 2: "OBSERVATION -- language distribution matches (concept 40758030)" Results differ beginning at row 1: have: NULL want: (*YOR,1) not ok 3 - OBSERVATION -- marital distribution matches (concept 40766231) Failed test 3: "OBSERVATION -- marital distribution matches (concept 40766231)" Results differ beginning at row 1: have: NULL want: ("LIFE PARTNER",15) not ok 4 - OBSERVATION -- insurance distribution matches (concept 46235654) Failed test 4: "OBSERVATION -- insurance distribution matches (concept 46235654)" Results differ beginning at row 1: have: NULL want: ("Self Pay",611) not ok 5 - OBSERVATION -- ethnicity distribution matches (concept 44803968) Failed test 5: "OBSERVATION -- ethnicity distribution matches (concept 44803968)" Results differ beginning at row 1: have: NULL want: ("AMERICAN INDIAN/ALASKA NATIVE FEDERALLY RECOGNIZED TRIBE",3) ok 6 - OBSERVATION -- source concept described ok 7 - OBSERVATION -- primary key is always unique not ok 8 - OBSERVATION -- datetimeevents number Failed test 8: "OBSERVATION -- datetimeevents number" Results differ beginning at row 1: have: (0) want: (4485342) ok 9 - OBSERVATION -- Standard concept checker Looks like you planned 10 tests but ran 9 1..4 not ok 1 - DRUG_EXPOSURE -- check number of patients with prescription matches Failed test 1: "DRUG_EXPOSURE -- check number of patients with prescription matches" Results differ beginning at row 1: have: (0,0) want: (39363,50216) not ok 2 - DRUG_EXPOSURE -- check drug_source_value matches source Failed test 2: "DRUG_EXPOSURE -- check drug_source_value matches source" Results differ beginning at row 1: have: NULL want: ("zz ",1) ok 3 - DRUG_EXPOSURE -- is concept source id full filled ok 4 - DRUG_EXPOSURE -- Standard concept checker Looks like you failed 2 tests of 4 1..2 ok 1 - visit occurrence nb ok 2 - check radio nb 1..3 ok 1 - primary key checker ok 2 - source concept described ok 3 - source concept described ok 4 - Note Nlp pass, w00t! Looks like you planned 3 tests but ran 4

So, my question is: why are these errors invoked? And what are the solutions? Thank you!!

parisni commented 4 years ago

hi ! do you have the etl/StandardizedHealthSystemDataTables/CARE_SITE/check_etl.sql files ? apparently most of the tests works. BTW tests are not mandatory to run the ETL

YiKuanLiu commented 4 years ago

Hi, thanks for the replying! I checked my folder etl/StandardizedHealthSystemDataTables/CARE_SITE/, there's only 2 files: etl.sql and README.md. Is that right?
How can I get the check_etl.sql files?

YiKuanLiu commented 4 years ago

@parisni I've check my mimic database with omop search_path, and I found there's nothing (0 rows) in the omop.measurement...

Is that implying some certain step got wrong? I think it should be related to the errors. Please help me out.

Thanks