RobinDenz1 / adjustedCurves

An R-Package to estimate and plot confounder-adjusted survival curves (single event survival data) and confounder-adjusted cumulative incidence functions (data with competing risks) using various methods.
https://robindenz1.github.io/adjustedCurves/
GNU General Public License v3.0
36 stars 4 forks source link

Can you get past the memory limit without having Rstudio crashing #22

Closed dvaiman closed 8 months ago

dvaiman commented 8 months ago

Rstudio is crashing alternatively got a warning on memory. its a dataset with approximately 250 000 subjects. The variable Exercise is 5 category factor variable. Total amount of events i about 10 000 and median followup time is 37 yrs.

It works when using a sliced dataset of 20 000 instead of 250 000.

Also tried setting R_MAX_VSIZE=300Gb

Code used

cox_model <- coxph(Surv(risk_time_20, ICD_I_incidenceORdeath) ~ Exercise + Sex, data = data, x=TRUE)

adjustedsurv(data=data,
                        variable="Exercise",
                        ev_time="risk_time_20",
                        event="ICD_I_incidenceORdeath",
                        method="direct",
                        outcome_model=cox_model,
                        conf_int=T,
                        na.action = "na.omit",
                        n_cores = 2)

I have a new macbook pro with 36gb ram. Is the only solution to find a more powerful computer?

Sorry for no reproducible example.

This is a plot with 20 000 individuals from the original 250 000 individuals:

image

RobinDenz1 commented 8 months ago

Thanks for the detailed description. When using method="direct" with a coxph object supplied to the outcome_model argument, the ate() function of the riskRegression package is used internally to perform all calculations, which can become unwieldly with large datasets.

However, there are multiple things you can try:

  1. The biggest issue is confidence interval calculation, because the ate() function relies on the efficient influence function to calculate those. By setting conf_int to FALSE it may become executable. I understand that this may be unacceptable (because confidence intervals are important!).
  2. Internally, the survival probability is estimated at all points in time at which an event occurs. This is overkill if you have this many events, because many of them will be very close to each other in time. You can change this by supplying a suitably fine grid of points in time to the times argument of the adjustedsurv() function, which will greatly reduce the computational effort.
  3. If neither of the aforementioned options work, you could try to use another method. In particular, inverse probability of treatment weighting methods are very computationally efficient, even when setting conf_int=TRUE. (see ?surv_iptw_km for example).

Hope this helps!

EDIT: I just noticed that you only consider a single confounder in your model, named sex. If that is really all you want to adjust for I would recommend using method="strat_nieto", which is computationally efficient and offers a non-parametric confounder-adjustment procedure. It does, however, only work with a small set of categorical confounders. So that might be perfect for your situation!

dvaiman commented 8 months ago

Thanks for the swift answer! the method="strat_nieto" method worked. Also managed some models with conf_int=F and with a time grid.