gaynorr / AlphaSimR

R package for breeding program simulations
https://gaynorr.github.io/AlphaSimR/
Other
42 stars 18 forks source link

Report realised genic variances (possibly also expected?) #113

Open gregorgorjanc opened 1 year ago

gregorgorjanc commented 1 year ago

Currently, AlphaSimR::genParam() returns genic variance that is in one way "realised" with respect to genotype freq and allele sub effect, but also "expected" with respect to mating, which is confusing. @gaynorr I appreciate that you are following Bulmer's definition, but I wonder if it would be best to report genic variance as "realised" with respect to genotype freq and allele & "realised" with respect to mating. To get this value now I have to do something like this:

gPar <- AlphaSimR::genParam(Pop)
gPar$genicVarG + gPar$covG_HW

I bet that most users won't figure this bit out - we were bitten by this just now.

I think that for all other components you opted for the "realised" instead of "expected" so it would make sense to stick to "realised" everywhere.

Since you internally calculate expected variances too, maybe we could add these to the output too and hence have the complete picture.

We are happy to work on this.

We can discuss this in person soon;)

gaynorr commented 1 year ago

I actually use both forms of the average effect of an allele substitution.

There are two reference populations being used by AlphaSimR to calculate genetic and genic variances. Both assume no LD between loci. The first assumes genotype frequencies follow Hardy-Weinberg Equilibrium (HWE) and the second assumes genotype frequencies match the observed frequencies.

The HWE reference population is used to calculate genic variance. This calculation involves calculating the average effect of an allele substitution under the HWE genotype frequencies.

The second reference population is used for the rest of the genetic variance calculations. This calculation involves calculating the average effect of an allele substitution under the observed genotype frequencies. The breeding values reported by AlphaSImR come from this reference population.

I'll also note that in version 1.3.3 I fixed a bug where the average effect under HWE wasn't be calculated correctly.

I'm open to other ideas for naming. I mainly went with the Bulmer definition, because it was the only thing I could find for a formal definition of genic variance. I also like the fact that it splits out how departures from HWE and LD contribution to genetic variance.

gregorgorjanc commented 1 year ago

Looking at C++ https://github.com/gaynorr/AlphaSimR/blob/6bf1b1db55b0557abbf01dfd2e20b80231c39967/src/calcGenParam.cpp#L385 my understanding is that you have “expected” genic variance (using HWE genotype frequencies) in genicVarA2=genicA2, while genicVarA=genicA is "realised" (using observed genotype frequencies).

Looking at R calling the C++ code https://github.com/gaynorr/AlphaSimR/blob/6bf1b1db55b0557abbf01dfd2e20b80231c39967/R/popSummary.R#L188 your return "expected" genicVarA2 (using HWE genotype frequencies), but separeteky you report covA_HW=genicVarA-genicVarA2 that explains how the "realised" genicVarA deviates from the "expected" one genicVarA2 (Non-Random mating).

I wonder if it would be more instructive to swap these two, by default report "realised" genic var, but then have a clear "expected" genic var also returned for completenes.

gaynorr commented 1 year ago

Expand on descriptions in the genParam functions and add those descriptions to the access functions and/or provide a link.