OHDSI / Cyclops

Cyclops (Cyclic coordinate descent for logistic, Poisson and survival analysis) is an R package for performing large scale regularized regressions.
http://ohdsi.github.io/Cyclops/
38 stars 32 forks source link

more efficient support for time-varying covariates in cyclopsdata for cox models #51

Open myoung3 opened 3 years ago

myoung3 commented 3 years ago

From the release package documentation

These columns are expected in the outcome object:
- stratumId (integer) (optional) Stratum ID for conditional regression models
- rowId (integer) Row ID is used to link multiple covariates (x) to a single outcome (y)
- y (real) The outcome variable
- time (real) For models that use time (e.g. Poisson or Cox regression) this contains time
(e.g. number of days)
- weights (real) (optional) Non-negative weights to apply to outcome
- censorWeights (real) (optional) Non-negative censoring weights for competing risk model; will be computed if not provided.

These columns are expected in the covariates object:
- stratumId (integer) (optional) Stratum ID for conditional regression models
- rowId (integer) Row ID is used to link multiple covariates (x) to a single outcome (y)
- covariateId (integer) A numeric identifier of a covariate
- covariateValue (real) The value of the specified covariate

The correct way to dealing with timevarying data in a cox model is to split each individual's follow-up period into multiple intervals at each change in their covariate value. Thus a time-varying dataset for cox analysis would have more than [edit] 1 row per person, and the above data spec would require the covariates object to have the same row length as the outcome object. In the case of a cox model with both time-varying and time-invariant variables, all of the time-invariant values would need to be repeated for every interval within participant. A more efficient data structure would allow a time-invariant covariate object which would join to the outcome object on participant id, along with a time-varying covariates object which would link to the outcome on both participant id and time.

msuchard commented 3 years ago

Thanks for looking into this, @myoung3 . I am very interested in providing both a convenient and efficient interface to cyclops for time-varying covariates. Naturally, efficiency includes both in terms of "space" (as you bring up above) and "time" (compute speed that may decrease dramatically with the extra layer of memory-indirection).

Could I entice you to work on this further with my group?

Do you have a specific use-case in mind where performance / memory-usage becomes an issue?