National-COVID-Cohort-Collaborative / Data-Ingestion-and-Harmonization

Data Ingestion and Harmonization
41 stars 12 forks source link

NOTE and NOTE_NLP dataset are being submitted by all CDMs. Need to process them and add them to the Data Catalog #79

Closed stephanieshong closed 2 years ago

stephanieshong commented 2 years ago
  1. parse
  2. generate ids
  3. publish to LDS when the following checks pass: check referenced note_id, person id, visit_occurrence_id and visit_detail_id exist within the payload.
  4. optional - generate rejected rows due to missing reference ids from step 3. 5 Add pipeline changes to all of the data sources, OMOP, PCORNet, ACT, TriNetX.
  5. We need to accommodate, special cases where if the data partner forgets to submit the NOTE and NOTE_NLP datasets in the current payload then we should not overwrite the previously loaded dataset but try to reprocess the data with the current person and visit_occurrence ids.
stephanieshong commented 2 years ago

Pipeline changes for OMOP CDM has been added.

stephanieshong commented 2 years ago

Implemented support to process optional NLP datasets.