chop-dbhi / data-models

Collection of various biomedical data models in parseable formats.
https://data-models-service.research.chop.edu
28 stars 8 forks source link

Adding PEDSnet v2.5 and OMOP 5.1 #162

Closed burrowse closed 7 years ago

burrowse commented 7 years ago

Addresses: https://github.com/chop-dbhi/data-models/issues/161 https://github.com/chop-dbhi/data-models/issues/160 https://github.com/chop-dbhi/data-models/issues/151

@murphyke @eceowl Please review and merge.

murphyke commented 7 years ago

Hi, @burrowse, when the data models service consumes this branch, it reports these errors:

WARN[0005] parse (data-models/github.com/burrowse/data-models/pcornet/v3.1/renamings.csv): could not detect file type 
WARN[0005] parse (data-models/github.com/burrowse/data-models/pcornet/v3/renamings.csv): could not detect file type 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): could not reference table `domain` by domain_id 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): could not reference table `location` by location_id 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): could not reference table `person` by person_id 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): could not reference table `provider` by provider_id 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): could not reference table `relationship` by relationship_id 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): could not reference table `visit_occurrence` by visit_occurrence_id 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): could not reference table `vocabulary` by vocabulary_id 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `cost` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `device_exposure` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `domain` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `dose_era` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `drug_era` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `drug_exposure` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `drug_strength` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `fact_relationship` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `measurement` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `note` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `observation_period` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `observation` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `payer_plan_period` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `person` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `procedure_occurrence` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `provider` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `relationship` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `source_to_concept_map` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `specimen` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `visit_occurrence` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1): no source table `vocabulary` 
WARN[0005] refs (data-models/github.com/burrowse/data-models/omop/v5.1:condition_occurrence): no source field `condition_status_concept_id` 

The refs errors are probably triggered when the references.csv file is processed ....

The renamings.csv errors may be benign, since pcornet v3 shows the same warning.

burrowse commented 7 years ago

Okay. @murphyke @aaron0browne

burrowse commented 7 years ago

@murphyke @aaron0browne Also, if the omop v5.1 is proving to be problematic it can be removed for now. We really want version 2.5 of PEDSnet that was also added (and also has no errors according to the service) to be available as soon as possible.

murphyke commented 7 years ago

@burrowse I'll take a look at it. If we can't figure the errors quickly, we can just move ahead without omop 5.1, as you suggest.

murphyke commented 7 years ago

@burrowse In definitions/condition_occurrence.csv, the line for condition_status_concept_id is missing a comma. Viewing the files in github can be helpful for identifying csv-formatting issues, because the file is only rendered as a table if the format is good; otherwise, you see the raw comma-separated data.

definitions/cost.csv also has similar issues. For each of the "source field" errors, make sure the number of fields is correct, and make sure everything lines up right in Excel. The description column values that contain commas should be wrapped in quotation marks -- it's easiest just to always wrap them, just to be safe.

murphyke commented 7 years ago

@burrowse Also, I realize this is not your issue, but the pcornet renamings.csv files are misplaced. The exists files under the version should be merged into a single renamings.csv file under pcornet/ (the same way it is done for omop, etc).

burrowse commented 7 years ago

@murphyke Okay, thanks! I fixed those issues. What can be done about the pedsnet->omop issues?

burrowse commented 7 years ago

@murphyke All errors/warnings cleared now. Changing the "v1,v4" references to "1.0.0 and 4.0.0" worked for those files in the mapping folder.

murphyke commented 7 years ago

@burrowse Fantastic, thanks for fixing those long-standing mapping warnings. It looks like we're good to go. I will merge these. IIRC, the data models service will pick up the changes within an hour. @aaron0browne I forget, will the DMSA service need to be poked in order to pick up the changes in the underlying data models service?

murphyke commented 7 years ago

Just a heads up; some further change will be required. The CircleCI test took so long that I just went ahead and merged, but it unfortunately it eventually failed. The data-models-sqlalchemy service is choking on what it's getting from the underlying service.

burrowse commented 7 years ago

@murphyke Sure, happy to fix. Any idea what is going on?

murphyke commented 7 years ago

It looks like there is an empty data type. Can you check that for any new table, there is a csv file in v2.5/schema/, and that for any new field, there is an entry for it in the corresponding csv file under v2.5/schema/?

murphyke commented 7 years ago

That is a shot in the dark. If that's not the problem, I will fire up a debugger.

burrowse commented 7 years ago

Not bad for a shot in the dark.... admitting_source_value in definitions was admitting_source_source_value in schema.... Should be okay now @murphyke

murphyke commented 7 years ago

There must be another error of the same type, because I reran data-models locally and saw it pull your change to schema/visit_occurrence.csv, but dmsa is throwing the same exception. I will see how easily I can modify dmsa to give more info about the error.

murphyke commented 7 years ago

@burrowse discharge_to_concept_id is the only other one

burrowse commented 7 years ago

@murphyke Sorry! My eyes are getting crosseyed

murphyke commented 7 years ago

@burrowse OK, pedsnet looks good. There are still some problems with omop 5.1. So you can either submit another PR to get the pedsnet changes going, and another PR later for the omop, or you can combine both in a single PR. It's up to you. The omop problems:

ERROR: no fields for table adt_occurrence
(u'ERROR', u'Blank type for field `device_exposure_end_datetime`, table `device_exposure`')
(u'ERROR', u'Blank type for field `device_exposure_start_datetime`, table `device_exposure`')
(u'ERROR', u'Blank type for field `drug_exposure_end_datetime`, table `drug_exposure`')
(u'ERROR', u'Blank type for field `measurement_datetime`, table `measurement`')
ERROR: no fields for table measurement_organism
(u'ERROR', u'Blank type for field `note_datetime`, table `note`')
ERROR: no fields for table procedure_cost
(u'ERROR', u'Blank type for field `procedure_datetime`, table `procedure_occurrence`')
(u'ERROR', u'Blank type for field `admitting_source_concept_id`, table `visit_occurrence`')
(u'ERROR', u'Blank type for field `discharge_to_concept_id`, table `visit_occurrence`')
(u'ERROR', u'Blank type for field `preceding_visit_occurrence_id`, table `visit_occurrence`')
(u'ERROR', u'Blank type for field `visit_end_datetime`, table `visit_occurrence`')
(u'ERROR', u'Blank type for field `visit_start_datetime`, table `visit_occurrence`')
ERROR: no fields for table visit_payer

The 'no fields' errors mean that there is no csv file under definitions/ for the table, or possibly that there is a file but there are no fields listed in it. And as always, to generate DDL via the sqlalchemy service, there has to be a corresponding csv file under schema/ for each table.