denominator cohort build drill out the ram

darwin-eu / IncidencePrevalence

Estimating incidence and prevalence with the OMOP CDM

https://darwin-eu.github.io/IncidencePrevalence/

Other

9 stars 7 forks source link

denominator cohort build drill out the ram #5

Closed rfherrerac closed 8 months ago

rfherrerac commented 8 months ago

Describe the bug When running generateDenominatorCohortSet in a US large dataset in redshift, the memory ram is consumed vastly. And takes forever

R version 4.2.3 (2023-03-15) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux 8.7 (Ootpa)

Matrix products: default BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.15.so

Version 0.4.1 did not have that issue ran pretty fast.

edward-burn commented 8 months ago

Thanks for reporting this @rfherrerac, can you share the settings you used with the function and I will investigate this?

rfherrerac commented 8 months ago

Thanks @edward-burn
cdm <- generateDenominatorCohortSet( cdm = cdm, name = "denominator", cohortDateRange = as.Date(c("2018-01-01", "2020-12-31")),#c(lubridate::ymd("2021-01-01"), lubridate::ymd("2023-01-31")), ageGroup = list(c(18,44), c(45,64), c(65,74), c(75,100)), sex = c("Male", "Female", "Both"), daysPriorObservation = 1, requirementInteractions=FALSE )

edward-burn commented 8 months ago

Thanks @rfherrerac, let me take a look and get back to you. I'm actually preparing a new release so hopefully we can get this fixed in that. I only have a got access to a small redshift test database, so it would be great if you could test this new release on your data if that would be ok?

rfherrerac commented 8 months ago

For sure! happy to do so.

edward-burn commented 8 months ago

@rfherrerac I'm not seeing anything obvious that I've changed that would of caused this (but I'll keep looking). Can I just check what versions of dbplyr and RPostgres you have installed? I'm just wondering if it might relate to https://github.com/r-dbi/RPostgres/issues/457

rfherrerac commented 8 months ago

Hi @edward-burn I have RPostgres 1.4.6. and dbplyr 2.4.0

edward-burn commented 8 months ago

Hi @rfherrerac, could you please try with the 0.7 version of IncidencePrevalence that is now out on cran? I realised that a dependency I was using was collecting data into R, and so with this fixed I´m hoping your issue will be solved but would be great if you could confirm

rfherrerac commented 8 months ago

Hi @edward-burn, it worked perfectly. Thanks a lot!