JasperHof / SPARE

2 stars 0 forks source link

Proposal to implement Marginalized Multilevel Models #11

Open lucavd opened 7 months ago

lucavd commented 7 months ago

Hi Jasper, After using your package, I had an idea to merge it with something I worked on for my specialization thesis. I worked on Marginalized Multilevel Models (not to be confounded with Marginal Models). Your package could function as a base to develop what I have in mind; if you like, I have a proposal to work together on it. So, from the beginning.

Abstract fo my thesis Longitudinal data analysis has been of growing interest in many biological fields, including omics data analysis, to investigate the evolution of a response variable over time. In this context, we propose to compare and interpret the performance of conditional, marginal, and marginalized models in omics-like datasets, building on previous research. We simulated a longitudinal binary data set and fitted three different models to the data: marginalized multilevel models (MMM), generalized estimating equation models (GEE), and generalized linear mixed effects regression models (GLMM). We also applied the same techniques to a real dataset. The models were fitted using simulated data to assess their performance in estimating the true effect, the variability of the estimates, and type I and II errors. We demonstrate that the MMM model performs well even with small numbers of clusters and observations, while caution should be exercised when using other models, regardless of the size of the dataset, especially in the interpretation of their coefficients.

Conclusions In our hands, MMM performed better than GLMM and GEE in almost all settings, giving more reliable estimates when the sample size was 15 or more. The median estimates of GEE and MMM were comparable, but the GEE variability in the estimate was worse than that of MMM. The Type II error was well controlled by both GLMM and MMM but was on Type I error where MMM outperformed the other two models while keeping the false positive rates around the nominal 5%. This is of particular importance for omics data since their dimensionality requires a large number of comparisons that can empirically be greater than the nominal 5% rate, even if corrected for a false discovery rate. Finally, the shrinkage of GLMM coefficients greatly ameliorates the coefficient estimate making it comparable to the GEE and MMM ones. The shrinkage of the GLMM coefficient deflates the effect of the covariate depending on the random effect variability in a conditional model, making the point estimate comparable with GEE and MMM. We demonstrated that the MMM model performs well even with a small number of clusters and observations. In contrast, the literature tends to rely heavily on GLMM models, which can lead to overoptimistic effects and misinterpretation of estimates as marginal rather than conditional effects. The GEE model is another commonly used procedure, but it is asymptotic and has high variability in results, especially for small clusters. Therefore, it is unbiased but inefficient. The marginalized model performs better for small clusters and small effects.

Some refs Griswold ME, Swihart BJ, Caffo BS, Zeger SL. Practical marginalized multilevel models. Stat. 2013;2(1):129–42.

Heagerty PJ, Zeger SL. Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Stat Sci. 2000 Feb;15(1):1–26.

Proposal So the idea is to implement MMM as you implemented the survival analysis in SPARE: specification of the null hypothesis, GWAS model, and comparison. Same thing you did, but with a different model. I think this will be very important in GWAS analysis since we will give the real population average effect of a SNV, that is something I envise to be important in such studies. The drawback is that I have little experience in coding packages and I was wondering if you are interested in working with me for in this. We would not start from scratch since we have your package and this one https://github.com/mercaldo/MMLB that I used for the thesis. The project will be to try to merge the two.

Let me know what you think

JasperHof commented 7 months ago

Hi Luca,

Thanks a lot for your message, it sounds very interesting. What is your email adress? I can reply there in a bit more detail :-)

Best,

Jasper

lucavd commented 7 months ago

luca.vedovelli [the symbol you know] ubep.unipd.it

Looking forward to hearing from you!

On Thu, Dec 7, 2023, 10:43 JasperHof @.***> wrote:

Hi Luca,

Thanks a lot for your message, it sounds very interesting. What is your email adress? I can reply there in a bit more detail :-)

Best,

Jasper

— Reply to this email directly, view it on GitHub https://github.com/JasperHof/SPARE/issues/11#issuecomment-1845012926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4JXGFZI5QB4UFB64AOOKDYIGFUJAVCNFSM6AAAAABAJOBEZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBVGAYTEOJSGY . You are receiving this because you authored the thread.Message ID: @.***>