CecileProust-Lima / lcmm

R package lcmm
https://CecileProust-Lima.github.io/lcmm/
48 stars 13 forks source link

Using lcmm with large data #258

Open Alik-V opened 2 weeks ago

Alik-V commented 2 weeks ago

Hi @CecileProust-Lima, thank you for developing this package!

Do you have any advice / recommendations trying to use lcmm for large RWE datasets? I wanted to fit the model for ~350k patients and ~5mil records, but while doing some preliminary tests and runs on a reduced dataset I am realising that it may not be a feasible approach in this case.

Is latent class / latent process even a viable approach in this case?

VivianePhilipps commented 1 week ago

Hi,

with large samples we usually recommend to do a first analysis with a subsample, like you did. You can run the time consuming grid search on this subsample, and then estimate only the final models again on the whole sample, starting from the estimates obtained on the subsample. Of course, it will still be very long with millions of observations, but this strategy will save some time.

Viviane