gaynorr / AlphaSimR

R package for breeding program simulations
https://gaynorr.github.io/AlphaSimR/
Other
42 stars 18 forks source link

Write GenomicSelection vignette #123

Open gregorgorjanc opened 1 year ago

gregorgorjanc commented 1 year ago

The aim here is to show:

  * how to build training sets and run the models
  * what models and methods are implemented
  * Mention ascertainment bias https://github.com/gaynorr/AlphaSimR/issues/117
  * Mention fixEff https://github.com/gaynorr/AlphaSimR/discussions/115
  * Save ebv from elsewhere https://github.com/gaynorr/AlphaSimR/discussions/82
gaynorr commented 1 year ago

This script shows general use of GS in AlphaSimR.

Here is information on using the fixed effect slot in populations.

Here are some general comments on GS in AlphaSimR.

Several models are supported:

  1. Ridge regression BLUP models that fit a random effect for markers and a fixed effect from the fixEff slot (RRBLUP and RRBLUP2).
  2. Ridge regression BLUP model that fits a random effect for markers and an intercept according to a user supplied variances (fastRRBLUP). This is just an implementation of an older version of AlphaBayes that matches the version released in the original AlphaSim. Despite it's name, it is not always faster than the previous ridge regression implementations. It's primary benefit is that it has a low memory usage due to a mixed precision implementation that avoids storing all loci in floating point format at any one time.
  3. Models that fit a random additive effect and a random dominance effect for markers (RRBLUP_D and 'RRBLUP_D2`). These models use genotypic coding of loci to get additive and dominance effects. Back solving is used to get at the breeding values of individuals using either the training population's genotype frequencies or the frequencies of a prediction population. These models also fit a directional dominance term as fixed effect to try to improve prediction accuracy of the dominance effect.
  4. Models that fit gender specific additive effects (RRBLUP_GCA and RRBLUP_GCA2). These models fit random effects for the mother and father of an individual using the haplotype information. The models were intended for modeling GS in plant breeding where hybrids are used in the training population and we are fitting the inbred genotypes of the parents as two separate random effects. This function could be used to model crossbred animals or hybrids in outbred crops too, but it assume perfect assignment of haplotypes which is overly favorable.
  5. Models that fit gender specific additive effects and dominance effects (RRBLUP_SCA and RRBLUP_SCA2). These models effectively combine the features of the previous two models to fit genotypic effects with the ability to back solve for gender specific breeding values (GCA). Like the GCA models, these models are targeted for plant breeding applications where hybrids are produced from inbred lines.

You'll notice that the models list above tend to a variant with a '2' in its name. This alternative model uses a simple EM approach to estimate variance components that is very efficient when the number of markers is much smaller than the number of individuals in the training population. The original models were designed to be efficient when the number of markers is larger than the number of individuals in the training population.