Open barabo opened 6 months ago
Thanks for your script, saved me a lot of effort. I extended your script to replace tabs with spaces:
This is at an early stage of development- but would be grateful for any feedback - but this might help efforts with SQLite. See https://github.com/wardle/iort - it can now dynamically generate and execute DDL and handle the vocab import and is working for SQLite and PostgreSQL. I need to add customisation to choose different databases in a similar way to schema in PostgreSQL for example, but might help?
For those on this thread, or anyone who lands here. I'm working with a sqlite cdm database for a demo, which uses an experimental new jq
module library to quickly map FHIR resources into cdm records.
Demo: https://github.com/barabo/fhir-to-omop-demo
fhir-jq
: https://github.com/barabo/fhir-jq
Both are still WIP, but this is what motivated me to update the DDL like this.
Hi there!
I've been playing around trying to get the DDLs loaded for sqlite3, version 5.4 using the
sqlite3
command line and I've been having a pretty rough go of it using the provided definitions.I didn't find any documentation here or elsewhere about the
@cdmDatabaseSchema
notation in the ddl files, so I went ahead and removed them. I assume they are something used by the sqlrender utility - but I'm not an R user - I'm just trying to create a working OMOPCDM database in sqlite.I also discovered that sqlite does not allow you to apply primary or foreign keys to tables using the
ALTER TABLE
commands provided. So, I wrote a brittle little script to apply the PK and FK definitions to the main_ddl.sql
file.This allows me to use the patched
_ddl.sql
file to define tables with PKs and FKs. The_indices.sql
file seems fine as-is.I did notice that the
COHORT
table is missing aPRIMARY KEY
, so the foreign key check fails in a clean install for just that table. It almost seems likeCOHORT
should havecohort_id integer PRIMARY KEY
andCOHORT_DEFINITION
should havecohort_definition_id integer PRIMARY KEY
- and then theCOHORT
table should have a FK onCOHORT_DEFINITION.cohort_definition_id
(instead, it seems backwards). But as it is now, if I makeCOHORT.cohort_definition_id
the PK of theCOHORT
table - it gets me past the failing foreign key checks. Anyway, I expect some of that will change in future versions - my script includes a hack for this.For posterity, I'm providing my script here. However, are there instructions for people who don't want to use R but just want to create a working DB out there somewhere?
Click to see my script
```sh #!/bin/bash # # File: update-ddl.sh # # Converts the OMOP-provided DDL database definition scripts into a format # that can be loaded by sqlite3. # # This has been tested against the provided sqlite 5.4 DDL files. # # To run this, place copies of the provided .sql files into this directory # and run the script. It will modify the sql files and create an empty # database called cdm.db # set -e set -o pipefail set -u # There should be only one DDL file in the local directory. DDL=$( ls OMOPCDM*_ddl.sql ) [ ! -e ${DDL} ] && echo "Failed to find an OMOPCDM*_ddl.sql schema!" && exit 1 # Awk scriptlet to add a PRIMARY KEY declaration to a table.column in a table # definition. # # Assumptions: # 'table' is padded with spaces # 'column' begins with a tab and ends with a space # AWK_PK=' /^CREATE TABLE / { if ($0 ~ table) { target=1 } else { target=0 } } { if (target && $0 ~ column && (!($0 ~ / PRIMARY KEY,/))) { sub(/,/, " PRIMARY KEY,"); } print $0 } ' # Awk scriptlet to add a FOREIGN KEY declaration to a table.column in a table # definition. # # Assumptions: # 'table' is padded with spaces # 'column' begins with a tab and ends with a space # 'ref' is multiple words long # AWK_FK=' /^CREATE TABLE / { if ($0 ~ table) { target=1 } else { target=0 } } { if (target && $0 ~ column && (!($0 ~ / REFERENCES /))) { if ($0 ~ /,$/) { sub(/,$/, " REFERENCES " ref ","); } else { sub(/ );$/, " REFERENCES " ref ");"); } } print $0 } ' ## # Gets the table names and PK columns from constraints. # function get_pks() { grep '^ALTER TABLE ' OMOPCDM*_primary_keys.sql \ | sed -e 's:.* \(.*\) ADD CONSTRAINT .* PRIMARY KEY .\(.*\).;:\1 \2:' } ## # Decorate the PK columns with PRIMARY KEY if not already decorated. # function mark_pk() { # Apply the change to the ddl file content, saving into a temp file. cat ${DDL} | awk -v table=" ${1} " -v column=" ${2} " "${AWK_PK}" \ > ${DDL}.tmp # diff the files to assert diff count is 1 changed line only. (( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1 # Apply the temp file to the DDL file. mv ${DDL}.tmp ${DDL} } function get_fks() { grep '^ALTER TABLE ' OMOPCDM*_constraints.sql \ | sed -e 's:ALTER TABLE \(.*\) ADD CONSTRAINT .* FOREIGN KEY .\([^ ]*\). REFERENCES:\1 \2:' \ | sed -e 's:;::' } ## # Decorate the PK columns with PRIMARY KEY if not already decorated. # function mark_fk() { # Apply the change to the ddl file content, saving into a temp file. cat ${DDL} \ | awk -v table=" ${1} " -v column=" ${2} " -v ref="${3}" "${AWK_FK}" \ > ${DDL}.tmp # diff the files to assert diff count is 1 changed line only. (( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1 # Apply the temp file to the DDL file. mv ${DDL}.tmp ${DDL} } # Remove the template schema reference, which isn't supported in sqlite. sed -i -e 's:@cdmDatabaseSchema.::g' OMOPCDM*.sql && rm -f OMOPCDM*.sql-e # Update the table definitions to insert the PRIMARY KEY declarations. echo "Adding PRIMARY KEYS to DDL" get_pks | while read table column; do mark_pk ${table} ${column} done # Insert the FK constraints into the table. echo "Adding FOREIGN KEYS to DDL" get_fks | while read table column reference; do mark_fk ${table} ${column} "${reference}" done # HACK: no PK in the cohort table, so this is needed to satisfy FK checks. mark_pk cohort cohort_definition_id # Populate the DB tables and indices. echo "Creating empty cdm.db database" rm -f cdm.db touch cdm.db sqlite3 cdm.db < ${DDL} sqlite3 cdm.db < OMOPCDM*_indices.sql sqlite3 cdm.db "pragma foreign_key_check" # Success! echo "DONE!" ``` This updates the `_ddl.sql` file to look like this: ```sql CREATE TABLE person ( person_id integer NOT NULL PRIMARY KEY, gender_concept_id integer NOT NULL REFERENCES CONCEPT (CONCEPT_ID), year_of_birth integer NOT NULL, month_of_birth integer NULL, day_of_birth integer NULL, birth_datetime REAL NULL, race_concept_id integer NOT NULL REFERENCES CONCEPT (CONCEPT_ID), ethnicity_concept_id integer NOT NULL REFERENCES CONCEPT (CONCEPT_ID), location_id integer NULL REFERENCES LOCATION (LOCATION_ID), provider_id integer NULL REFERENCES PROVIDER (PROVIDER_ID), care_site_id integer NULL REFERENCES CARE_SITE (CARE_SITE_ID), person_source_value TEXT NULL, gender_source_value TEXT NULL, gender_source_concept_id integer NULL REFERENCES CONCEPT (CONCEPT_ID), race_source_value TEXT NULL, race_source_concept_id integer NULL REFERENCES CONCEPT (CONCEPT_ID), ethnicity_source_value TEXT NULL, ethnicity_source_concept_id integer NULL REFERENCES CONCEPT (CONCEPT_ID)); ``` When the script runs correctly, you see output like this: ``` (base) anderson.carl3@home test % ./update-ddl.sh Adding PRIMARY KEYS to DDL Adding FOREIGN KEYS to DDL Creating empty cdm.db database DONE! ```