National-COVID-Cohort-Collaborative / Data-Ingestion-and-Harmonization

Data Ingestion and Harmonization
41 stars 12 forks source link

Addition of two columns to Site Data Manifest table #15

Closed DaveraGabriel closed 4 years ago

DaveraGabriel commented 4 years ago

18May2020 Manifest table needs the following two columns :

VOCABULARY_VERSION VARCHAR2(20 BYTE) --------add this column , SITE_NAME VARCHAR2(200 BYTE) < --------- currently, we only have this column , SITE_ABBREV_name varchar2(50) ---------------add this column


eMail discussion:

From: Pfaff, Emily epfaff@email.unc.edu Sent: Monday, May 18, 2020 12:44 PM To: Stephanie Hong stephanie.hong@jhu.edu Cc: Davera Gabriel dgabrie4@jhmi.edu; Christopher Chute chute@jhu.edu Subject: RE: CDM_SOURCE table

We could add that one field to the MANIFEST table, probably later this week (just to avoid making changes while sites are piloting today and tomorrow). It won’t apply to PCORnet, but I think it will apply to OMOP and ACT. Can you put in a GitHub Issue for this so I don’t lose track of the request?

From: Stephanie Hong stephanie.hong@jhu.edu Sent: Monday, May 18, 2020 12:40 PM To: Pfaff, Emily epfaff@email.unc.edu Cc: Davera Gabriel dgabrie4@jhmi.edu; Christopher Chute chute@jhu.edu Subject: RE: CDM_SOURCE table

One information we do not have is the VOCAB version and I wasn’t sure if this will be relevant at the data quality checking step.

           VOCABULARY_VERSION VARCHAR2(20 BYTE)

Stephanie

From: Pfaff, Emily epfaff@email.unc.edu Sent: Monday, May 18, 2020 12:21 PM To: Stephanie Hong stephanie.hong@jhu.edu Cc: Davera Gabriel dgabrie4@jhmi.edu; Christopher Chute chute@jhu.edu Subject: RE: CDM_SOURCE table

My preference would be for that table to be created centrally. Unless I’m misunderstanding, it seems to me like all of that information could be derived from the CDM version information in the MANIFEST table, and that it’s more information about the CDM itself rather than anything site-specific.

Let me know if I’m misunderstanding.

From: Stephanie Hong stephanie.hong@jhu.edu Sent: Monday, May 18, 2020 12:13 PM To: Pfaff, Emily epfaff@email.unc.edu Cc: Davera Gabriel dgabrie4@jhmi.edu; Christopher Chute chute@jhu.edu Subject: CDM_SOURCE table

Hi Emily,

OMOP has CDM_SOURCE TABLE which is used by the DQD scripts!

Would it be of if the CDM source is OMOP that each site fill out this table as well? Or do you rather that we fill this out to our best of knowledge based on the Manifest table?

Stephanie

CREATE TABLE CDM_SOURCE ( "CDM_SOURCE_NAME" VARCHAR2(255 BYTE), "CDM_SOURCE_ABBREVIATION" VARCHAR2(25 BYTE), "CDM_HOLDER" VARCHAR2(255 BYTE), "SOURCE_DESCRIPTION" CLOB, "SOURCE_DOCUMENTATION_REFERENCE" VARCHAR2(255 BYTE), "CDM_ETL_REFERENCE" VARCHAR2(255 BYTE), "SOURCE_RELEASE_DATE" DATE, "CDM_RELEASE_DATE" DATE, "CDM_VERSION" VARCHAR2(10 BYTE), "VOCABULARY_VERSION" VARCHAR2(20 BYTE) )

As a reference here is the structure of our manifest table. CREATE TABLE MANIFEST ( MANIFEST_ID NUMBER(18, 0) NOT NULL , SITE_NAME VARCHAR2(200 BYTE) , SITE_ABBREV_name varchar2(50) , CONTACT_NAME VARCHAR2(200 BYTE) , CONTACT_EMAIL VARCHAR2(200 BYTE) , CDM_NAME VARCHAR2(100 BYTE) , CDM_VERSION VARCHAR2(20 BYTE) NOT NULL , N3C_PHENOTYPE_YN VARCHAR2(5 BYTE) NOT NULL , N3C_PHENOTYPE_VERSION NUMBER(18, 1) NOT NULL <----- need to have decimal , RUN_DATE timestamp NOT NULL , UPDATE_DATE timestamp NOT NULL , NEXT_SUBMISSION_DATE timestamp NOT NULL -------used internally , DATASET_STATUS NUMBER(, 0) , DATA_PARTNER_ID NUMBER(, 0) NOT NULL , PROCESS_timestamp timestamp NOT NULL ) ;

DaveraGabriel commented 4 years ago

@empfff @stephanieshong is this an outstanding issue? Please advise

empfff commented 4 years ago

I think you can close.