How to check overfitting in lcmm

CecileProust-Lima / lcmm

R package lcmm

https://CecileProust-Lima.github.io/lcmm/

53 stars 13 forks source link

How to check overfitting in lcmm #197

Closed tinaty closed 1 year ago

tinaty commented 1 year ago

Hello, Is there any way to check if a latent class mixed model fitted by the lcmm function for a continuous longitudinal outcome?

VivianePhilipps commented 1 year ago

Hello,

we have no dedicated function to check overfitting, but we provide prediction functions that you may use for that. What type of diagnosis are you considering?

Viviane

tinaty commented 1 year ago

Thank you, Viviane! That sounds good idea - could you kindly advise how to use the prediciton functions for checking overfitting? Not quite sure what did you mean by 'the type of diagnosis'. Did you mean clinical diagnosis? (if so, it is a type of infectious disease). In essence, I tried to use latent class mixed modelling in multiple centres' data (dataset A) to identify serveral trajectories classes. And we recently have new data coming from new centres (dataset B) and I would like to see if the previous models are robust and replicate in dataset B (i.e. to check if the previous models were overfitting on dataset A or not). What would be the best way you would suggest that I can look at this? Thank you very much.

VivianePhilipps commented 1 year ago

My question was : What are your plans for diagnosing overfitting? By overfitting we mean generally that the model predicts very well the observations in dataset A, but performs poorly in dataset B. Do you want to predict the longitudinal trajectory in B using estimations obtained from A? Please look at the vignette https://cecileproust-lima.github.io/lcmm/articles/usual_problems.html to see how to make predictions on external data.

tinaty commented 1 year ago

Thanks a lot Viviane for your advice. Specificially, I was initially planning to check if the classes that were identified from the previous dataset A can be replicated in the new dataset B. I have now used the estimations obtained from dataset A to predict the trajectory in dataset B and then plot and compare the predictions and observations (which looks good), but in this way, it only check from the visualation perspective, therefore I was wondering if there are any quantitative metrics you would recommend for evaluating how well the estimations (model) obtained from dataset A would perform on dataset B?

VivianePhilipps commented 1 year ago

If you look at criteria such as AIC or BIC, you won't be able to conclude on the model's performance because these criteria are used to compare models and not to evaluate a single model. Criteria like MSE, Brier-score or entropy could rather be used because, for the MSE for example, you know that the closer to 0, the better is the model.

tinaty commented 1 year ago

Thanks Viviane for your advice.