Build CNV Database - Githubissues

Build CNV database schema and tables with the correct structure for the Evidence Plot proposed in PediatricOpenTargets/ticket-tracker#288. This ticket is heavily based off the thorough example @logstar wrote for building the methylation database schema in PedatricOpenTargets/OpenPedCan-api#69.

Adding a CNV Database Schema

Add new copy number variant (CNV) database schema, using any appropriate name, to store all CNV tables and to avoid table name collisions with existing tables in bulk_expression schema. Following are the steps to add in a new dna_methylation schema:

Add CNV_SCHEMA=cnv in ../OpenPedCan-api-secrets/common_db.env.
Add ENV CNV_SCHEMA ="cnv" in Dockerfile.

Add the following code in db/init_db.sh:

CREATE SCHEMA ${CNV};
GRANT SELECT ON ALL TABLES IN SCHEMA ${CNV} TO ${DB_USERNAME};
ALTER DEFAULT PRIVILEGES IN SCHEMA ${CNV} GRANT SELECT ON TABLES TO ${DB_USERNAME};

Add CNV_SCHEMA = "CNV_SCHEMA" in db/r_interfaces/db_env_vars.R.

Add the following code in db/load_db.sh:

GRANT USAGE ON SCHEMA ${DNA_METHYLATION_SCHEMA} TO ${DB_USERNAME};
GRANT SELECT ON ALL TABLES IN SCHEMA ${DNA_METHYLATION_SCHEMA} TO ${DB_USERNAME};
ALTER DEFAULT PRIVILEGES IN SCHEMA ${DNA_METHYLATION_SCHEMA} GRANT SELECT ON TABLES TO ${DB_USERNAME};

Add --schema="$CNV_SCHEMA" in db/build_tools/build_db_docker_cmd.sh to include CNV_SCHEMA in the database dump.

Adding CNV Tables

Add new CNV tables to store CNV results that are required to generate plots and tables for MTP. The DNA methylation tables can have any appropriate names, and the following description uses dna_methylation_summary as an example table name. Following are the steps to add a new dna_methylation_summary table:

Add CNV_EVIDENCE_SUMMARY_TBL=cnv_evidence_summary in ../OpenPedCan-api-secrets/common_db.env.
Add ENV CNV_EVIDENCE_SUMMARY_TBL="cnv_evidence_summary" in Dockerfile.
Add CNV_EVIDENCE_SUMMARY_TBL = "CNV_EVIDENCE_SUMMARY_TBL" in db/r_interfaces/db_env_vars.R.

Create a minimal table for testing in db/build_tools/build_db.R:

# cnv_evidence_summary_tbl is assumed to be the table for loading into the database.
#
# cnv_evidence_summary_tbl does not have to be the full table, and it can only have zero or a few lines.
db_write_table(
  dplyr::slice(cnv_evidence_summary_tbl, 0), conn,
  tolower(db_env_vars$CNV_SCHEMA),
  tolower(db_env_vars$CNV_EVIDENCE_SUMMARY_TBL))

Prepare a CSV file for loading into the database. Since the file(s) in OpenPedCan-analysis are not correctly formatted, the file will need to be created using db/build_tools/build_db.R. Since the CNV plots will add at least 3 tables to the database, may want to conside splitting build_db.R into multiple scripts in the future.
- Upload a result file to a S3 bucket as new OpenPedCan-analysis data release. The result file can be in any format, which needs further preparation before loading into the database.
- The result file will be downloaded to OpenPedCan-analysis/data/ by db/build_db.sh.
- Load the result file in db/build_tools/build_db.R.
- Prepare the table for database loading, such as filtering, renaming columns, and reordering columns.
- Output the prepared table as CSV file like the procedure in db/build_tools/build_db.R.

Add the following code in db/build_tools/build_db_docker_cmd.sh to load the prepared CSV file to the database:

# Assuming the prepared CSV file is at ${BUILD_OUTPUT_DIR_PATH}/${CNV_SCHEMA}_${CNV_EVIDENCE_SUMMARY_TBL}.csv
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$DB_NAME" <<EOSQL
COPY ${CNV_SCHEMA}.${CNV_EVIDENCE_SUMMARY_TBL}
FROM '${BUILD_OUTPUT_DIR_PATH}/${CNV_SCHEMA}_${CNV_EVIDENCE_SUMMARY_TBL}.csv'
WITH (FORMAT csv, HEADER);
EOSQL

Add the following code in db/build_tools/build_db_docker_cmd.sh to add a table index on the column that will be queried:

echo "CREATE INDEX ensg_id_idx ON ${CNV_SCHEMA}.${CNV_EVIDENCE_SUMMARY_TBL} (\"Gene_Ensembl_ID\");" \
| gzip --no-name -c >> "$db_dump_out_path"

The updated database building procedure can be tested using ./db/build_db.sh.

PediatricOpenTargets / OpenPedCan-api

Build CNV Database #81

Adding a CNV Database Schema

Adding CNV Tables