Specialized Models section

jpiaskowski commented 1 year ago

I'm working on the specialized models section and here are some proposed changes. Please weigh in.

should we order this section alphabetically?
for the pedigree models (in which there is considerably more than what is listed) should we reference the ag task view instead? I also would also call this "kinship/pedigree models".
Can we change "penalised models" to "regularized models" since glmm also can be considered penalised?
I'd like to remove mention of MICE and reference the missing data task view. It seems like the focus should be packages whose primary purpose is mixed models.
I don't think the section for "large data sets" is needed unless we want to establish clear criteria for what belongs to that.
I'd like to rename "longitudinal data" to "repeated measures". It seems to me that many packages have functions for this (too many to list?) so maybe focus on the packages that have more options (e.g. 'nlme')
can we made lavaan a core package since that is thee package for SEM?
I'd like to move lmeNB to "generalized linear models" since it runs a negative binomial for its primary functionality - sound okay?

there's a few other changes (minor edits, packages to add), but it would easier for you to review those after I add them.

bbolker commented 1 year ago

1.

|should we order this section alphabetically? |

I guess that would be OK. I'm wishing there were some more principled ordering (can we identify clusters within these topics?) but alphabetical is a reasonable fallback.

2.

|for the pedigree models (in which there is considerably more than
what is listed) should we reference the ag task view instead? I also
would also call this "kinship/pedigree models". |

OK (are there a couple of dominant/core packages here?)

3.

|Can we change "penalised models" to "regularized models" since glmm
also can be considered penalised? |

Don't know. Maybe "penalized/regularized"? This doesn't strike me as a likely cause of confusion.

4.

|I'd like to remove mention of MICE and reference the missing data
task view. It seems like the focus should be packages whose primary
purpose is mixed models. |

OK. The reason I referenced MICE is that it probably is the dominant way of handling missing values in mixed models (i.e. I don't think there are commonly used packages that are specifically geared towards mixed models)

5.

|I don't think the section for "large data sets" is needed unless we
want to establish clear criteria for what belongs to that. |

OK. (This is similar to handling missing data, in that it's a fairly common "how do I ... with mixed models?" question.)

6.

|I'd like to rename "longitudinal data" to "repeated measures". It
seems to me that many packages have functions for this (too many to
list?) so maybe focus on the packages that have more options (e.g.
'nlme') |

OK. I also intended to add something to the 'scope' statement at the top to indicate that the task view did not deal generally with longitudinal models that incorporated latent variables (e.g. packages for Kalman filtering, dynamic linear models, etc.)

7.

|can we made lavaan a core package since that is *thee package* for
SEM? |

Fine with me

8.

|I'd like to move lmeNB to "generalized linear models" since it runs
a negative binomial for its primary functionality - sound okay? |

Fine with me

jpiaskowski commented 1 year ago

Missing data strikes me as a different topic, although agreed that MICE is widely used and valuable. While relevant to model fitting, it's outside the scope. These views are challenging to maintain, so keeping the scope tight will be a benefit in the long run.

Here are the kinship/mlm packages from the 'agriculture' task view:

GWAS (Genome Wide Association Studies)

There are many GWAS packages on Bioconductor.
GWAS can be conducted using a stepwise mixed linear model for multilocus data with r pkg("mlmm.gwas") or r github("Gregor-Mendel-Institute/MultLocMixMod") (use library(mlmm) to load the package in R). The package r pkg("statgenGWAS") can fit GWAS models using the EMMAX algorithm.
GWAS models for a very large number of SNPs and/or observations can be estimated with r pkg("rMVP") and r github("deruncie/megaLMM"). Functions for conducting GWAS in autotetraploids are provided by r github("jendelman/GWASpoly"), and these functions also work in diploid species. Variable selection for ultra-large dimensional GWAS data sets can be done with r pkg("bravo"), which implements the Bayesian algorithm SVEN, selection of variables with embedded screening.
r github("jendelman/StageWise") provides functions to conduct a 2-stage GWAS when the phenotypic data are from multiple field trials.
For polyploids, r github("jendelman/polyBreedR") provides convenience functions to facilitate the use of genome-wide markers for breeding autotetraploid species, and its functionality also extends to diploids.

Genomic prediction

General genomic selection packages: r github("famuvie/breedR") is a general purpose package for performing quantitative genetic analyses. Genome feature mixed linear models using frequentist and Bayesian approaches can be implemented with r pkg("qgg"). The package r pkg("STGS") implements several genomic selection models for single traits. r pkg("BWGS"), "Breed Wheat Genomic Selection", provides a pipeline of functions for conducting genomic selection in hexaploid wheat.
GBLUP: Packages supporting genetic prediction using mixed models augmented with pedigree or genetic marker data include r pkg("sommer", priority = "core"), r pkg("rrBLUP"), r pkg("BGLR"), lme4gs (this package has special installation instructions), r github("variani/lme4qtl"), r pkg("pedigreemm"), r pkg("qgtools"), r github("cheuerde/cpgen"), r pkg("QTLrel"), and the licensed software asreml. Many of these packages have built-in functionality for data preparation steps including data imputation and calculation of the relationship matrices.
r pkg("GSelection") implements genomic selection integrating additive and non-additive models.
r pkg("pedmod") provides linear modelling functions integrating kinship for categorical traits.
r pkg("coxme") can fit Cox proportional hazards models containing both fixed and random effects with a kinship matrix.
r pkg("GSMX"), multivariate genomic selection, estimates trait heritability and handles overfitting through cross validation.
r pkg("TSDFGS") can estimate the optimal training population size and composition for genomic selection.
Multiple environments and traits: r pkg("BGGE") conducts genomic prediction for continuous variables, focused on genotype-by-environment genomic selection models following the methods of Jarquín 2014. The package r pkg("BMTME") builds genomic selection prediction models that an be expanded to multiple traits and environments using Bayesian models developed by Montesinos-Lopéx (2016, 2018a, 2018b).

It's a big list clearly. I'm not sure what constitutes a major package from a mixed model perspective.

bbolker commented 1 year ago

I think I'm OK leaving most of these out/referring to the agriculture task view (the GBLUP category + coxme + brms + MCMCglmm seem like the only relevant bits). It's interesting that there isn't a "bioinformatics" view, although I guess most of the interesting stuff in that area is on Bioconductor rather than CRAN.

There is also a small sample of phylogenetic machinery in https://cran.r-project.org/web/views/Environmetrics.html (pez, phyr are in this category)
the in-progress Phylogenetics view says that 'I believe there was talk of combining the "Phylogenetics" and "Genetics" task views, but now it sounds like there is going to be an "Omics" task view'

tuxette commented 1 year ago

There was a plan to have a "Omics" task view but no big progress in this direction so far.

jpiaskowski commented 1 year ago

I'll leave this open pending what happens with the Agriculture task view, but otherwise, it sounds like we are in agreement. I am psyched to learn about coxme, which solves some challenges my clients have experienced.

bbolker commented 1 year ago

On 2022-08-03 4:11 p.m., Julia Piaskowski wrote:

I'm working on the specialized models section and here are some proposed changes. Please weigh in.

1.
|should we order this section alphabetically? |

 I guess that would be OK. I'm wishing there were some more

principled ordering (can we identify clusters within these topics?) but alphabetical is a reasonable fallback.

2.

|for the pedigree models (in which there is considerably more than
what is listed) should we reference the ag task view instead? I also
would also call this "kinship/pedigree models". |

OK (are there a couple of dominant/core packages here?)

3.

|Can we change "penalised models" to "regularized models" since glmm
also can be considered penalised? |

Don't know. Maybe "penalized/regularized"? This doesn't strike me as a likely cause of confusion.

4.

|I'd like to remove mention of MICE and reference the missing data
task view. It seems like the focus should be packages whose primary
purpose is mixed models. |

OK. The reason I referenced MICE is that it probably is the dominant way of handling missing values in mixed models (i.e. I don't think there are commonly used packages that are specifically geared towards mixed models)

5.

|I don't think the section for "large data sets" is needed unless we
want to establish clear criteria for what belongs to that. |

OK. (This is similar to handling missing data, in that it's a fairly common "how do I ... with mixed models?" question.)

6.

|I'd like to rename "longitudinal data" to "repeated measures". It
seems to me that many packages have functions for this (too many to
list?) so maybe focus on the packages that have more options (e.g.
'nlme') |

OK. I also intended to add something to the 'scope' statement at the top to indicate that the task view did not deal generally with longitudinal models that incorporated latent variables (e.g. packages for Kalman filtering, dynamic linear models, etc.)

7.

|can we made lavaan a core package since that is *thee package* for
SEM? |

Fine with me

8.

|I'd like to move lmeNB to "generalized linear models" since it runs
a negative binomial for its primary functionality - sound okay? |

Fine with me

there's a few other changes (minor edits, packages to add), but it would easier for you to review those after I add them.

— Reply to this email directly, view it on GitHub https://github.com/bbolker/mixedmodels-taskview/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAATIRSBB4V5EWSB4EFQH23VXLG6BANCNFSM55QDHGGQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering (Acting) Graduate chair, Mathematics & Statistics

E-mail is sent at my convenience; I don't expect replies outside of working hours.

cran-task-views / MixedModels

Specialized Models section #2