PediatricOpenTargets / OpenPedCan-api

2 stars 7 forks source link

Investigate and fix CSV format incompatibilities between `readr::write_csv` and PostgreSQL `COPY FROM` #38

Open logstar opened 2 years ago

logstar commented 2 years ago

In database building procedure, R dataframes are output into CSV files, and the CSV files are loaded into a PostgreSQL database using COPY FROM SQL command.

The R output CSV format is not completely compatible with PostgreSQL COPY FROM command.

The following case is known to be incompatible:

The following cases need to be further investigated for compatibility:

Currently, PostgreSQL COPY FROM compatible CSV files are output by the following function.

https://github.com/PediatricOpenTargets/OpenPedCan-api/blob/c165d95cd4826ef372c2f1267d0e9e0f8f1f030f/db/build_tools/build_db.R#L153-L157

The values in ${BULK_EXP_SCHEMA}_${BULK_EXP_TPM_HISTOLOGY_TBL}.csv, which is the only CSV file as of 09/17/2021, are all compatible with PostgreSQL COPY FROM. However, CSV files that will be added in #37 may contain incompatible values.

Refs: