darwin-eu / IncidencePrevalence

Estimating incidence and prevalence with the OMOP CDM
https://darwin-eu.github.io/IncidencePrevalence/
Other
9 stars 7 forks source link

Error: rapi_prepare: Failed to prepare query #4

Open ssivarama opened 1 year ago

ssivarama commented 1 year ago

Hi Team,

My name is Saptha from Gilead Sciences, I am an Associate Data Scientist working on the Real World Evidence team.

I am exploring the incidence prevalence package on the OMOP CDM hosted on Redshift. I am able to connect to the CDM and was able to see the records of the person table using the R code below after making the connection

cdm <- CDMConnector::cdm_from_con(con, cdm_schema = cdmDatabaseSchema, write_schema = sandboxSchema )

print(cdm$person, n = 10)

When I run this code to generate the cohort

outcome_cohorts <- CDMConnector::readCohortSet(here::here("outcome_cohorts")) cdm <- CDMConnector::generateCohortSet( cdm = cdm, cohortSet = outcome_cohorts, name = "ild_ae2", overwrite = TRUE )

I am getting the following error message

Error: rapi_prepare: Failed to prepare query INSERT INTO Codesets (codeset_id, concept_id) SELECT 0 as codeset_id, c.concept_id FROM (select distinct I.concept_id FROM ( select concept_id from main.CONCEPT where concept_id in (4110182,37312199,4341520,4017691,256450,45768909,437588,439853,44807209,42538810,4026217,42539089,4112814,4275496,762964,45767051,4166517,3654836,3654837,45769386,3655115,4112681,252348,252946,37116655,36714118,4311555,4025168,4028118,4116317,4120270,4051465,4112813,444084,4140134,435298,4322799,4285279,4103099,37110889,45763749,36712839,45763750,438782,440748,4119786,4140605,46272927,42539687,3655634,42537658,42537657,46270493,42539090,4273378,3655969,600563,600562,4236182,4195014,4084955,434975,438175,4209871,4230447,4119446,4052548,433233,4227290,4044215,4222731,37208102,4173590,4249010,4226132,4148529,4215594,437906,435853,4102140,4236725,44783629,45768996,4174275,4086243,4119785,4045227,4184896,4093002,45771023,4124546,4124671,4119448,4119935,4330286,36 Screenshot 2023-09-27 at 1 20 58 PM

Can you please help me in resolving this issue?

edward-burn commented 1 year ago

Hi @ssivarama, apologies for being slow to get back to you. Could you please check that you are using the RPostgres package to connect to your database?

ssivarama commented 1 year ago

Hi @edward-burn, Thank you for your response, yes I am using RPostgres package for connecting to Redshift. I am seeing this issue happen when I execute the below commands in sequence. I suspect that mockIncidencePrevalenceRef is calculated using data from DuckDB and while executing generateCohortSet (which should be run against the CDM on Redshift) it is unable to find the schema named main. Please let me know if my thinking is correct. So after changing the variable name for mockIncidencePrevalenceRef to cdm_mock or commenting the mockIndicencePrevalenceRef command, I am able to run it against the CDM on Redshift.

cdm <- mockIncidencePrevalenceRef( sampleSize = 50000, outPre = 0.5 )

outcome_cohorts <- CDMConnector::readCohortSet(here::here("outcome_cohorts")) cdm <- CDMConnector::generateCohortSet( cdm = cdm, cohortSet = outcome_cohorts, name = "ild_ae2", overwrite = TRUE )

In summary,

  1. Denominator cohort is created using a JSON file from OHDSI ATLAS
  2. Outcome cohorts (there are 8 different outcomes) are created using JSON files from OHDSI ATLAS
  3. While estimating period prevalence, I am getting the following error """ Error below while executing lines 71-78 Getting prevalence for analysis 1 of 8 Error in seq.Date(from = s, to = e, by = i) : 'from' must be of length 1 """
  4. While estimating incidence, I am getting the following error """

    Error while executing lines 84-92

    rlang::last_trace(drop = FALSE) <error/vctrs_error_ptype2> Error in dplyr::bind_rows(): ! Can't combine ..1$number_records and ..2$number_records .

    Backtrace: ▆

    1. ├─IncidencePrevalence::estimateIncidence(...)
    2. │ └─dplyr::bind_rows(...)
    3. │ └─vctrs::vec_rbind(!!!dots, .names_to = .id, .error_call = current_env())
    4. └─vctrs (local) <fn>()
    5. └─vctrs::vec_default_ptype2(...)
    6. ├─base::withRestarts(...)
    7. │ └─base (local) withOneRestart(expr, restarts[[1L]])
    8. │ └─base (local) doWithOneRestart(return(expr), restart)
    9. └─vctrs::stop_incompatible_type(...)
    10. └─vctrs:::stop_incompatible(...)
    11. └─vctrs:::stop_vctrs(...)
    12. └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call) """

I am also providing the R script I am using for incidence_prevalence_pkg_testing.txt your reference.

edward-burn commented 1 year ago

In your script I see the package RPostgreSQL, but can you try with RPostrges (confusingly their are two packages but we would normally use the RPostrges)? And also for a denominator cohort with a target cohort, you would use the targetCohortTable argument https://darwin-eu.github.io/IncidencePrevalence/reference/generateDenominatorCohortSet.html. Does something like the below work for you?

library(CDMConnector)
library(IncidencePrevalence)
library(dplyr)
library(tidyr)
library(ggplot2)
library(RPostgres)

# Add connection detail here

denom_target_cohort <- CDMConnector::readCohortSet(here::here("meta_lc_denom_cohort"))
cdm <- CDMConnector::generateCohortSet(
  cdm = cdm,
  cohortSet = denom_target_cohort,
  name = "denom_target_cohort",
  overwrite = TRUE
)

cdm <- generateDenominatorCohortSet(
  cdm = cdm,
  name = "denominator",
  targetCohortTable = denom_target_cohort)

outcome_cohorts <- CDMConnector::readCohortSet(here::here("outcome_cohorts"))
cdm <- CDMConnector::generateCohortSet(
  cdm = cdm,
  cohortSet = outcome_cohorts,
  name = "outcome",
  overwrite = TRUE
)

## Calculating period prevalence
prev <- estimatePeriodPrevalence(
  cdm = cdm,
  denominatorTable = "denominator",
  outcomeTable = "outcome",
  interval = "overall",
  minCellCount = 0,
  temporary = FALSE
)

inc <- estimateIncidence(
  cdm = cdm,
  denominatorTable = "denominator",
  outcomeTable = "outcome",
  interval = "quarters",
  outcomeWashout = 0,
  repeatedEvents = FALSE,
  temporary = FALSE
)
ssivarama commented 1 year ago

I was actually using RPostgres::Redshift() for the connection but somehow my import also had RPostgreSQL probably caused the confusion. I tried the following for the denominator cohort per your suggestion but I couldn't find that argument in the function generateCohortSet.

denom_target_cohort <- CDMConnector::readCohortSet(here::here("meta_lc_denom_cohort")) cdm <- CDMConnector::generateCohortSet( cdm = cdm, cohortSet = denom_target_cohort, name = "denom_target_cohort", overwrite = TRUE )

cdm <- generateDenominatorCohortSet( cdm = cdm, name = "denominator", targetCohortTable = denom_target_cohort)

Got this error

Error in generateDenominatorCohortSet(cdm = cdm, name = "denominator", : unused argument (targetCohortTable = denom_target_cohort)

I tried to substitute that with strataTable but that also threw a different error "Error in x_raw[[i]] : invalid subscript type 'list'".

Can you please let me know how to proceed further? Thanks.

ssivarama commented 1 year ago

Hi @edward-burn, I realized there was an update to the package to include the targetCohortTable fields which I tried after updating the version I had. But I am still seeing this error "Error in x_raw[[i]] : invalid subscript type 'list'" when I execute the below lines

cdm <- generateDenominatorCohortSet( cdm = cdm, name = "denominator", targetCohortTable = denom_target_cohort)

I believe the issue is related to subsetting and returning the denominator cohort in the end (generateDenominatorCohortSet).