OHDSI / ETL-Synthea

A package supporting the conversion from Synthea CSV to OMOP CDM
https://ohdsi.github.io/ETL-Synthea/
96 stars 70 forks source link

LoadVocabFromCsv null in vocabulary_version #92

Closed oneilsh closed 3 years ago

oneilsh commented 3 years ago

Hello again :) I'm still working on ETLing the Synthea COVID 100K dataset, running into an error in the LoadVocabFromCsv step.

I did diverge from the suggested code slightly by generating the indexing scripts and running the index one (in a previous run the LoadEventTables step ran for > 4 hours on the condition_occurance table I think it was, after generating the indices, so I'm trying again from scratch with the latest codebase...)

some settings:

cd <- DatabaseConnector::createConnectionDetails(
  dbms     = "postgresql", 
  server   = "localhost/synthea10", 
  user     = "oneils", 
  password = "", 
  port     = 5432,
  pathToDriver = "."
)

cdmSchema      <- "cdm_synthea10"
cdmVersion     <- "5.3.1"
syntheaVersion <- "2.7.0"
syntheaSchema  <- "native"
syntheaFileLoc <- "100k_synthea_covid19_csv"
vocabFileLoc   <- "omop_vocab_athena_download"

Sys.setenv(POSTGRES_PATH = "/usr/local/bin/postgres")

ETLSyntheaBuilder::CreateCDMTables(connectionDetails = cd, cdmSchema = cdmSchema, cdmVersion = cdmVersion)
ETLSyntheaBuilder::CreateSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaVersion = syntheaVersion)
# Optional: Create index and constraint DDL scripts for the rdbms that support them.  Scripts will be written to the "output" directory.
ETLSyntheaBuilder::CreateCDMIndexAndConstraintScripts(connectionDetails = cd, cdmSchema = cdmSchema, cdmVersion = cdmVersion, githubTag = "v5.3.1_fixes")
system("psql -d synthea10 -f output/postgresql_5.3.1_index_ddl.sql") # actually run on command-line, ALTER TABLEs successfully run

ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaFileLoc = syntheaFileLoc)
ETLSyntheaBuilder::LoadVocabFromCsv(connectionDetails = cd, cdmSchema = cdmSchema, vocabFileLoc = vocabFileLoc)

I get an error for vocabulary.csv, but the others appear to have loaded successfully:

> ETLSyntheaBuilder::LoadVocabFromCsv(connectionDetails = cd, cdmSchema = cdmSchema, vocabFileLoc = vocabFileLoc)
Connecting using PostgreSQL driver
|--------------------------------------------------|
|==================================================|
Loading: concept.csv
  |=======================================================================================================================================| 100%
Loading: vocabulary.csv
  |==                                                                                                                                     |   2%java.sql.BatchUpdateException: Batch entry 0 INSERT INTO cdm_synthea10.vocabulary (vocabulary_id,vocabulary_name,vocabulary_reference,vocabulary_version,vocabulary_concept_id) VALUES('Death Type','OMOP Death Type','OMOP generated',NULL,44819135) was aborted: ERROR: null value in column "vocabulary_version" of relation "vocabulary" violates not-null constraint
  Detail: Failing row contains (Death Type, OMOP Death Type, OMOP generated, null, 44819135).  Call getNextException to see other errors in the batch.
    at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:169)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2286)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:521)
    at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:870)
    at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:893)
    at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1644)
    at org.ohdsi.databaseConnector.BatchedInsert.executeBatch(BatchedInsert.java:114)
Caused by: org.postgresql.util.PSQLException: ERROR: null value in column "vocabulary_version" of relation "vocabulary" violates not-null constraint
  Detail: Failing row contains (Death Type, OMOP Death Type, OMOP generated, null, 44819135).
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2285)
    ... 5 more
ERROR: null value in column "vocabulary_version" of relation "vocabulary" violates not-null constraint
  Detail: Failing row contains (Death Type, OMOP Death Type, OMOP generated, null, 44819135).
  |=======================================================================================================================================| 100%
|--------------------------------------------------|
|==================================================|
Loading: concept_ancestor.csv
  |=======================================================================================================================================| 100%
|--------------------------------------------------|
|==================================================|
Loading: concept_relationship.csv
  |=======================================================================================================================================| 100%
Loading: relationship.csv
  |==============================================================================================================================================================| 100%
Loading: concept_synonym.csv
  |==============================================================================================================================================================| 100%
Loading: domain.csv
  |==============================================================================================================================================================| 100%
Loading: concept_class.csv
  |==============================================================================================================================================================| 100%
|--------------------------------------------------|
|==================================================|
Loading: drug_strength.csv
  |==============================================================================================================================================================| 100%
There were 11 warnings (use warnings() to see them)

Thank you for looking!

oneilsh commented 3 years ago

I managed to get this to run via some hints in #57, with some tweaks to tableName RE #86:

LoadVocabFromCsv <- function (connectionDetails, cdmDatabaseSchema, vocabFileLoc)
{

  #csvList <- c("concept.csv","vocabulary.csv","concept_ancestor.csv","concept_relationship.csv","relationship.csv","concept_synonym.csv","domain.csv","concept_class.csv", "drug_strength.csv")
  csvList <- c("vocabulary.csv")

  conn <- DatabaseConnector::connect(connectionDetails)

  for (csv in csvList) {

    vocabTable <- data.table::fread(file = paste0(vocabFileLoc, "/", csv), stringsAsFactors = FALSE, header = TRUE, sep = "\t", na.strings = NULL)

    # Format Dates for tables that need it
    if (base::identical(csv,"concept.csv") || base::identical(csv,"concept_relationship.csv") || base::identical(csv,"drug_strength.csv")) {

      vocabTable$valid_start_date <- as.Date(as.character(vocabTable$valid_start_date),"%Y%m%d")
      vocabTable$valid_end_date   <- as.Date(as.character(vocabTable$valid_end_date),"%Y%m%d")
    }

    writeLines(paste0("Loading: ",csv))

    DatabaseConnector::insertTable(conn,tableName=paste0(cdmDatabaseSchema,".",strsplit(csv,"[.]")[[1]][1]), data=as.data.frame(vocabTable), dropTableIfExists = FALSE, createTable = FALSE, progressBar = TRUE)
  }

  on.exit(DatabaseConnector::disconnect(conn))

}

LoadVocabFromCsv(cd, cdmSchema, vocabFileLoc)
mpreusse commented 3 years ago

@oneilsh What did you change in the code? I had the same issue but I simply added a random value to the vocabulary.csv file in the missing column.

oneilsh commented 3 years ago

Depends what you mean - I didn't modify any of the codebase internals, just included a version of the LoadVocabFromCsv function in the script, from the version defined in the fork linked from #57 with an explicit setting of the tableName parameter in the call to DatabaseConnector::insertTable (as I recall I had trouble getting params to match right by position against the signatutre for insertTable, possibly the param list has changed in newer versions of DatabaseConnector).

Here's my frankenstein script so far:


library(ETLSyntheaBuilder)

# download the drivers if needed to the current working directory
#downloadJdbcDrivers("postgresql", ".")

cd <- DatabaseConnector::createConnectionDetails(
  dbms     = "postgresql", 
  server   = "localhost/synthea10", 
  user     = "oneils", 
  password = "", 
  port     = 5432,
  pathToDriver = "."
)

cdmSchema      <- "cdm_synthea10"
cdmVersion     <- "5.3.1"
syntheaVersion <- "2.7.0"
syntheaSchema  <- "native"
syntheaFileLoc <- "100k_synthea_covid19_csv"
vocabFileLoc   <- "omop_vocab_athena_download"

Sys.setenv(POSTGRES_PATH = "/usr/local/bin/postgres")

# ahem https://github.com/OHDSI/ETL-Synthea/issues/57
# along with change from https://github.com/OHDSI/ETL-Synthea/commit/af15bc1f42097fb08b2291066daf399ed2b68fa1#diff-22371047436aa33aa4e84fc366ec4a76b6f0936e06ca5c90b2df5fbbb0b1ad44
LoadVocabFromCsv <- function (connectionDetails, cdmDatabaseSchema, vocabFileLoc)
{

  #csvList <- c("concept.csv","vocabulary.csv","concept_ancestor.csv","concept_relationship.csv","relationship.csv","concept_synonym.csv","domain.csv","concept_class.csv", "drug_strength.csv")
  csvList <- c("vocabulary.csv")

  conn <- DatabaseConnector::connect(connectionDetails)

  for (csv in csvList) {

    vocabTable <- data.table::fread(file = paste0(vocabFileLoc, "/", csv), stringsAsFactors = FALSE, header = TRUE, sep = "\t", na.strings = NULL)

    # Format Dates for tables that need it
    if (base::identical(csv,"concept.csv") || base::identical(csv,"concept_relationship.csv") || base::identical(csv,"drug_strength.csv")) {

      vocabTable$valid_start_date <- as.Date(as.character(vocabTable$valid_start_date),"%Y%m%d")
      vocabTable$valid_end_date   <- as.Date(as.character(vocabTable$valid_end_date),"%Y%m%d")
    }

    writeLines(paste0("Loading: ",csv))

    DatabaseConnector::insertTable(conn,tableName=paste0(cdmDatabaseSchema,".",strsplit(csv,"[.]")[[1]][1]), data=as.data.frame(vocabTable), dropTableIfExists = FALSE, createTable = FALSE, progressBar = TRUE)
  }

  on.exit(DatabaseConnector::disconnect(conn))

}

ETLSyntheaBuilder::CreateCDMTables(connectionDetails = cd, cdmSchema = cdmSchema, cdmVersion = cdmVersion)
ETLSyntheaBuilder::CreateSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaVersion = syntheaVersion)
# Optional: Create index and constraint DDL scripts for the rdbms that support them.  Scripts will be written to the "output" directory.
ETLSyntheaBuilder::CreateCDMIndexAndConstraintScripts(connectionDetails = cd, cdmSchema = cdmSchema, cdmVersion = cdmVersion, githubTag = "v5.3.1_fixes")
system("psql -d synthea10 -f output/postgresql_5.3.1_index_ddl.sql")

ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaFileLoc = syntheaFileLoc)
ETLSyntheaBuilder::LoadVocabFromCsv(connectionDetails = cd, cdmSchema = cdmSchema, vocabFileLoc = vocabFileLoc)
# fix for null constraint violation error when loading vocabulary.csv
LoadVocabFromCsv(cd, cdmSchema, vocabFileLoc)
ETLSyntheaBuilder::LoadEventTables(connectionDetails = cd, cdmSchema = cdmSchema, syntheaSchema = syntheaSchema, cdmVersion = cdmVersion)
mpreusse commented 3 years ago

Thanks! Yes, I was referring to the difference between the function from the issue and the original one. I kind of don't unuderstand yet why the tableName makes a difference but I'll try again tomorrow ;)

I put my first attempt of loading Synthea to OMOP into Docker containers, maybe useful: https://github.com/kaiserpreusse/synthea2omop

AnthonyMolinaro commented 3 years ago

EDITED: Just realized I addressed me question to the wrong person. : ) @oneilsh Just curious, is there missing data in the vocabulary.csv file you're using? I'm asking because from what you originally posted, a not-null constraint was violated. If you execute:

vocabTable <- data.table::fread(file = "omop_vocab_athena_download/vocabulary.csv", stringsAsFactors = FALSE, header = TRUE, sep = "\t", na.strings = "")
View(vocabTable)

Do you see missing data? (Would show up as NA in R)

oneilsh commented 3 years ago

Aha! Sorry, I'm pretty new to OHDSI and medical vocabularies. Looking at the result I'm guessing I got overly ambitious in which vocabularies I grabbed from Athena (since I wasn't sure which ones to grab, I did most of the ones that weren't licensed)

> vocabTable
           vocabulary_id                                                                                        vocabulary_name                                                                                                                           vocabulary_reference                                                                                           vocabulary_version vocabulary_concept_id
 1:           Death Type                                                                                        OMOP Death Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819135
 2:          Cohort Type                                                                                       OMOP Cohort Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819234
 3:     Condition Status                                                                                  OMOP Condition Status                                                                                                                                 OMOP generated                                                                                                         <NA>                 32887
 4:                 NUCC                         National Uniform Claim Committee Health Care Provider Taxonomy Code Set (NUCC)                                                                http://www.nucc.org/index.php?option=com_content&view=article&id=107&Itemid=132                                                                                              2018-06-26 NUCC              44819137
 5:         Revenue Code                                                                       UB04/CMS1450 Revenue Codes (CMS)                                                                                                          http://www.mpca.net/?page=ERC_finance                                                                                                 2010 Release              44819133
 6:               ICD9CM International Classification of Diseases, Ninth Revision, Clinical Modification, Volume 1 and 2 (NCHS)                                                                      http://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes.html                                                                               ICD9CM v32 master descriptions                  5046
 7:         Type Concept                                                                                      OMOP Type Concept                                                                                                                                 OMOP generated                                                                                        Type Concept 20210212                 32808
 8:                 Race                                                                     Race and Ethnicity Code Set (USBC)                                                                                    http://www.cdc.gov/nchs/data/dvs/Race_Ethnicity_CodeSet.pdf                                                                                                  Version 1.0              44819109
 9:               Domain                                                                                            OMOP Domain                                                                                                                                 OMOP generated                                                                                                         <NA>              44819147
10: CMS Place of Service                                                   Place of Service Codes for Professional Claims (CMS)                             http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/PhysicianFeeSched/downloads//Website_POS_database.pdf                                                                                                   2009-01-11              44819110
11:   Medicare Specialty                                                       Medicare provider/supplier specialty codes (CMS)                                      http://www.cms.gov/Medicare/Provider-Enrollment-and-Certification/MedicareProviderSupEnroll/Taxonomy.html                                                                                         2018-06-26 Specialty              44819138
12:                 None                                                                         OMOP Standardized Vocabularies                                                                                                                                 OMOP generated                                                                                               v5.0 26-FEB-21              44819096
13:            Note Type                                                                                         OMOP Note Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819146
14:                PHDSC                                                                     Source of Payment Typology (PHDSC)                                                                                       http://www.phdsc.org/standards/payer-typology-source.asp                                                                                                  Version 3.0                 32473
15:                 Cost                                                                                              OMOP Cost                                                                                                                                 OMOP generated                                                                                                         <NA>                581457
16: UB04 Point of Origin                                                       UB04 Claim Source Inpatient Admission Code (CMS)                                                                https://www.resdac.org/cms-data/variables/Claim-Source-Inpatient-Admission-Code                                                                                                         <NA>                 32045
17:            Meas Type                                                                                  OMOP Measurement Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819152
18:                Visit                                                                                             OMOP Visit                                                                                                                                 OMOP generated                                                                                                         <NA>              44819119
19:           Visit Type                                                                                        OMOP Visit Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819150
20:                ICDO3                             International Classification of Diseases for Oncology, Third Edition (WHO)                                                                                                               https://seer.cancer.gov/icd-o-3/                                                                   ICDO3 SEER Site/Histology Released 06/2019                581426
21:        UB04 Typ bill                                                              UB04 Type of Bill - Institutional (USHIK)                                                                          https://ushik.ahrq.gov/ViewItemDetails?&system=apcd&itemKey=196987000                                                                                                         <NA>                 32044
22:                LOINC                                Logical Observation Identifiers Names and Codes (Regenstrief Institute)                                                                                                               http://loinc.org/downloads/loinc                                                                                                         2.68              44819102
23:            US Census                                                                            United States Census Bureau                                                                             https://www.census.gov/geo/maps-data/data/tiger-cart-boundary.html                                                                                       US Census 2017 Release                 32570
24:       Condition Type                                                                         OMOP Condition Occurrence Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819127
25:             Metadata                                                                                               Metadata                                                                                                                                 OMOP generated                                                                                                         <NA>                 32675
26:            Ethnicity                                                                                         OMOP Ethnicity                                                                                                                                 OMOP generated                                                                                                         <NA>              44819134
27:             ICD10PCS                                                                   ICD-10 Procedure Coding System (CMS)                                                                                            http://www.cms.gov/Medicare/Coding/ICD10/index.html                                                                                                ICD10PCS 2021              44819125
28:       Procedure Type                                                                         OMOP Procedure Occurrence Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819128
29:              Episode                                                                                           OMOP Episode                                                                                                                                 OMOP generated                                                                                             Episode 20201014                 32523
30:      Obs Period Type                                                                           OMOP Observation Period Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819149
31:            Cost Type                                                                                         OMOP Cost Type                                                                                                                                 OMOP generated                                                                                                         <NA>                  5029
32:     Plan Stop Reason                                           Plan Stop Reason - Reason for termination of the Health Plan                                                                                                                                 OMOP generated                                                                                                         <NA>                 32474
33:        Concept Class                                                                                     OMOP Concept Class                                                                                                                                 OMOP generated                                                                                                         <NA>              44819233
34:                 UCUM                                              Unified Code for Units of Measure (Regenstrief Institute)                                                                         http://aurora.regenstrief.org/~ucum/ucum.html#section-Alphabetic-Index                                                                                                Version 1.8.2              44819107
35:  UB04 Pri Typ of Adm                                                         UB04 Claim Inpatient Admission Type Code (CMS)                                                                  https://www.resdac.org/cms-data/variables/Claim-Inpatient-Admission-Type-Code                                                                                                         <NA>                 32046
36:                  ATC                                                       WHO Anatomic Therapeutic Chemical Classification                                                                                                             http://www.whocc.no/atc_ddd_index/                                                                                              RxNorm 20200504              44819117
37:                 Plan  Health Plan - contract to administer healthcare transactions by the payer, facilitated by the sponsor                                                                                                                                 OMOP generated                                                                                                         <NA>                 32471
38:                 CPT4                                                         Current Procedural Terminology version 4 (AMA)                                                                 http://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html                                                                                                 2020 Release              44819100
39:            Drug Type                                                                                OMOP Drug Exposure Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819126
40:                  CDM                                                                                  OMOP Common DataModel                                                                                                       https://github.com/OHDSI/CommonDataModel                                                                                                   CDM v6.0.2                 32485
41:     Observation Type                                                                                  OMOP Observation Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819129
42:                HCPCS                                                        Healthcare Common Procedure Coding System (CMS)                                                                 http://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html                                                                                2020 Alpha Numeric HCPCS File              44819101
43:           Vocabulary                                                                                        OMOP Vocabulary                                                                                                                                 OMOP generated                                                                                                         <NA>              44819232
44:                 ABMS                                             Provider Specialty (American Board of Medical Specialties)                                                                          http://www.abms.org/member-boards/specialty-subspecialty-certificates                                                                                              2018-06-26 ABMS              45756746
45:          Device Type                                                                                       OMOP Device Type                                                                                                                                 OMOP generated                                                                                                         <NA>              44819151
46:                ICD10                                         International Classification of Diseases, Tenth Revision (WHO)                                                                                   http://www.who.int/classifications/icd/icdonlineversions/en/                                                                                                 2020 Release              44819124
47:                  OSM                                                                                          OpenStreetMap                                                         https://www.openstreetmap.org/copyright/en, https://wambachers-osm.website/boundaries/                                                                                       OSM Release 2019-02-21                 32541
48:              Sponsor                                  Sponsor - institution or individual financing healthcare transactions                                                                                                                                 OMOP generated                                                                                                         <NA>                 32472
49:   UB04 Pt dis status                                                               UB04 Patient Discharge Status Code (CMS)                                                                        https://www.resdac.org/cms-data/variables/patient-discharge-status-code                                                                                                         <NA>                 32047
50:             ICD9Proc       International Classification of Diseases, Ninth Revision, Clinical Modification, Volume 3 (NCHS)                                                                      http://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes.html                                                                               ICD9CM v32 master descriptions              44819099
51:              ICD10CM                 International Classification of Diseases, Tenth Revision, Clinical Modification (NCHS)                                                                                                        http://www.cdc.gov/nchs/icd/icd10cm.htm                                                                             ICD10CM FY2021 code descriptions              44819098
52:         Relationship                                                                                      OMOP Relationship                                                                                                                                 OMOP generated                                                                                                         <NA>              44819235
53:                  NDC                                                             National Drug Code (FDA and manufacturers) http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html, http://www.fda.gov/downloads/Drugs/DevelopmentApprovalProcess/UCM070838.zip                                                                                                 NDC 20210221              44819105
54:                  SPL                                                                      Structured Product Labeling (FDA)                                                                                      http://www.fda.gov/Drugs/InformationOnDrugs/ucm142438.htm                                                                                                 NDC 20210221              44819140
55:       OMOP Extension                                                                                 OMOP Extension (OHDSI)                                                                                                                                 OMOP generated                                                                                      OMOP Extension 20210226                 32758
56:  Korean Revenue Code                                                                                    Korean Revenue Code                                                                                                                                 OMOP generated                                                                                                         <NA>                 32724
57:               Gender                                                                                            OMOP Gender                                                                                                                                 OMOP generated                                                                                                         <NA>              44819108
58:             Currency                                                               International Currency Symbol (ISO 4217)                                                                                       http://www.iso.org/iso/home/standards/currency_codes.htm                                                                                                         2008              44819153
59:               RxNorm                                                                                           RxNorm (NLM)                                                                              http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html                                                                                              RxNorm 20210104              44819104
60:     RxNorm Extension                                                                               RxNorm Extension (OHDSI)                                                                                                                                 OMOP generated                                                                                  RxNorm Extension 2021-02-12                   252
61:               SNOMED                                          Systematic Nomenclature of Medicine - Clinical Terms (IHTSDO)                                                                 http://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html 2020-07-31 SNOMED CT International Edition; 2020-09-01 SNOMED CT US Edition; 2020-10-28 SNOMED CT UK Edition              44819097
           vocabulary_id                                                                                        vocabulary_name                                                                                                                           vocabulary_reference                                                                                           vocabulary_version vocabulary_concept_id
AnthonyMolinaro commented 3 years ago

@oneilsh Np. : )

I don't think there should be missing data in that file though (or the vocabulary_version column should, in fact, be nullable).

As a temporary hack, you can do something like:

if (csv == "vocabulary.csv")
  vocabTable <- vocabTable[-which(is.na(vocabTable$vocabulary_version))]

In the meantime, I'll check with our resident CDM expert to see if I can find out why there might be missing data from that file.

Hi @clairblacketer , is it expected that the vocabulary_version field of the vocabulary.csv downloaded from Athena is null?

AnthonyMolinaro commented 3 years ago

@oneilsh Hi Shawn, so Clair has confirmed that vocabulary_version should be NULL: http://ohdsi.github.io/CommonDataModel/cdm531.html#VOCABULARY

This is the case in cdm version 6.0.0. Alternatively, you can specify githubTag = "v5.3.1_fixes" when creating the cdm tables. Since some people might not be ale to use v5.3.1_fixes, I'm going to add a quick fix to change the ddl to allow this field to be NULL.