National-COVID-Cohort-Collaborative / Phenotype_Data_Acquisition

The repository for code and documentation produced by the N3C Phenotype and Data Acquisition workstream
60 stars 35 forks source link

Bug in createCohort for Redshift #232

Closed MPagel closed 5 months ago

MPagel commented 7 months ago

Using redshift as DBE

at 74% complete in N3cOhdsi::createCohort(connectionDetails = connectionDetails, sqlFilePath = phenotypeSqlPath, cdmDatabaseSchema = cdmDatabaseSchema, resultsDatabaseSchema = resultsDatabaseSchema ) executed from example_execution.R

Error in .createErrorReport(): ! Error executing SQL: com.amazon.redshift.util.RedshiftException: ERROR: column "control_person_id" is of type integer but expression is of type character varying Hint: You will need to rewrite or cast the expression. An error report has been created at C:/Users/mpagel/OneDrive - Emory University/Documents/ETL/N3C/Phenotype_Data_Acquisition-master/errorReportSql.txt Run rlang::last_trace() to see where the error occurred.

Backtrace: x

  1. -N3cOhdsi::createCohort(...)
  2. -DatabaseConnector::executeSql(conn, sql)
  3. -base::tryCatch(...)
  4. -base (local) tryCatchList(expr, classes, parentenv, handlers)
  5. -base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
  6. -value[3L]
  7. -DatabaseConnector:::.createErrorReport(dbms, err$message, sqlStatement, errorReportFile)
  8. -rlang::abort(...)

contents of referenced text file:

DBMS: redshift

Error: com.amazon.redshift.util.RedshiftException: ERROR: column "control_person_id" is of type integer but expression is of type character varying Hint: You will need to rewrite or cast the expression.

SQL:


INSERT INTO n3c_ncats.N3C_CONTROL_MAP
SELECT
person_id, 1 as buddy_num, NULL
FROM n3c_ncats.n3c_case_cohort
WHERE person_id NOT IN (
SELECT case_person_id
FROM n3c_ncats.N3C_CONTROL_MAP
WHERE buddy_num = 1
)
    UNION

    SELECT person_id, 2 as buddy_num, NULL
    FROM n3c_ncats.n3c_case_cohort
    WHERE person_id NOT IN (
        SELECT case_person_id
        FROM n3c_ncats.N3C_CONTROL_MAP
        WHERE buddy_num = 2
        )
> 
> 
> R version:
> R version 4.1.3 (2022-03-10)
> 
> Platform:
> x86_64-w64-mingw32
> 
> Attached base packages:
> - stats
> - graphics
> - grDevices
> - utils
> - datasets
> - methods
> - base
> 
> Other attached packages:
> - N3cOhdsi (0.2.2)
> - zip (2.2.0)
> - SqlRender (1.16.1)
> - remotes (2.4.2.1)
> - DatabaseConnector (6.3.0)

> print(.packages())
> [1] "N3cOhdsi"          "zip"               "SqlRender"         "remotes"          
> [5] "DatabaseConnector" "stats"             "graphics"          "grDevices"        
> [9] "utils"             "datasets"          "methods"           "base"             
also package `fs` used explicitly

DDL of referenced tables:

CREATE TABLE IF NOT EXISTS n3c_ncats.n3c_control_map ( case_person_id INTEGER NOT NULL ENCODE az64 ,buddy_num INTEGER NOT NULL ENCODE az64 ,control_person_id INTEGER ENCODE az64 )

CREATE TABLE IF NOT EXISTS n3c_ncats.n3c_case_cohort ( person_id INTEGER NOT NULL ENCODE az64 ,pt_age VARCHAR(20) ENCODE lzo ,sex VARCHAR(20) ENCODE lzo ,hispanic VARCHAR(20) ENCODE lzo ,race VARCHAR(20) ENCODE lzo )

MPagel commented 7 months ago
INSERT INTO @resultsDatabaseSchema.N3C_CONTROL_MAP
SELECT
        person_id, 1 as buddy_num, CAST(NULL as integer)
        FROM @resultsDatabaseSchema.n3c_case_cohort
        WHERE person_id NOT IN (
            SELECT case_person_id
            FROM @resultsDatabaseSchema.N3C_CONTROL_MAP
            WHERE buddy_num = 1
            )

        UNION

        SELECT person_id, 2 as buddy_num, CAST(NULL as integer)
        FROM @resultsDatabaseSchema.n3c_case_cohort
        WHERE person_id NOT IN (
            SELECT case_person_id
            FROM @resultsDatabaseSchema.N3C_CONTROL_MAP
            WHERE buddy_num = 2
            )
;

appears to work