Added imprinting trait and functionality

gregorgorjanc commented 11 months ago

@david20011999 and I have bit the bullet and added imprinting to AlphaSimR. This involved quite a bit of work, but it's quite a straightforward extension of the current logic since we used the orthogonal imprinting model from https://academic.oup.com/genetics/article/211/1/75/5931118 (imprinting effect is orthogonal to additive and dominance effects). While we have done a lot, there is more needed, but before we go too much down the road, we would like to hear feedback from you @gaynorr!

What we did is:

[x] added TraitAI (cool name for these days!) - we followed very closely the TraitAD!
[x] we modified calcGenParam() to work with imprinting (since i is often used for iteration, we used s in C++ code - s for silencing)
[x] we modified getGv() to work with imprinting (here we actually used getGvA2() function since it already works with haplotypes)
[x] we modified genParam() to return various imprinting related values (most notably variances, imprinting deviations, and separate breeding values for males and females)
[x] we have tweaked bv() so that we get expected behaviour for bisexual plants and animals with a sex (there is a corresponding id() = imprinting deviations function too)
[x] we added a test to make sure things are correct

Outstanding things are:

[ ] work on a nice vignette demonstrating this imprinting model and functionality
[ ] discuss the genParam() outputs - it's starting to "mushroom" and we are not entirely sure if we should add (co-)variances by sex (this makes sense for animal populations, not for most plant populations) - we need to gain more experience before we push with this (it seems needless just now)
[ ] expand to polyploids (we think we have a good grasp on what to do - should not be hard, but we wanted to first reach diploid milestone and to get feedback before we continue)
[ ] expand to class TraitADI (inherit TraitAD?), TraitAIE (inherit TraitAE?), TraitADIE (inherit TraitADE?), and all corresponding calcGenParam() and getGv() functions - we might need to discuss how to cover all these combinations! Huh:(

gaynorr commented 11 months ago

I'm going to toss this in a separate branch from devel at the moment to get a chance to get a really good look at it (like with altAddTraitAD). This is a really interesting thing to look at and add to AlphaSimR, but there are a lot of consequences for making this change that need to be explored.

The tricky part is going to be dealing with the variance components. Separating out breeding values for males and females gets particularly tricky, because it requires defining new reference populations. The paper you referenced doesn't deal with this challenge, because they are assuming HWE and equal allele frequencies for each sex. I'm also not sure that it is correct to think of an individual as having a male or female breeding value. Rather, I think it is their gametes that have these breeding values and their total breeding value is the some of their gamete's breeding values. However, I might be missing another way of thinking of this problem.

gregorgorjanc commented 11 months ago

Thanks for the fast response @gaynorr!

Indeed, we discussed with @david20011999 regarding HWE and also potential differences in frequencies between sexes. It gets hairy quickly, so at the moment we tried following your existing code to make changes digestible;) Wr can always expand later, if needed.

As to breeding value definition with imprinting, we are on the same page. The model we follow is actually providing standard breeding value formulation, which is a mean of maternal and paternal breeding values. So, we were able to use either depending on the context - see bv() changes.

It would be great to get feedback from you on the current implementation. We are keen to expand as needed.

david20011999 commented 11 months ago

Thanks for your reply @gaynorr!

I can follow your comments but I think they are going to the next step. As @gregorgorjanc said, I have been starting a research point about this topic and I will have to work on this in the future (it will be essential to my PhD). But, in my opinion, this paper is just adding a imprinting deviation to the standard genetic model, that differs between males and females (or acting as males or females in the case of hermaphrodite individuals). As example, we can observe that the alpha for the whole population and breeding values is the same alpha that Falconer defined. And of course, under this model, two different breeding values for males and females are needed: an A1A1 individual will produce A1A1 and A1A2 or A2A1 individuals, depending on the sex, but these heterozygous have different genetic values, then their progeny will differ.

The elephant in the room is Standard genetic model has problems when it is not under HWE conditions, and animal breeding has multiple variations of that conditions. Even without imprinting, the progeny produced of an individual (that the average value will be half of breeding value as definition) will depend directly of the alleles frequencies of the other sex, and this has not been taken into account until now (at least, as far as I know). In big populations with similar number of males and females we will expect similar alues and it will not have consequences, but in the case of reduced number of males, that is the most frequent situation in animal breeding, just for sampling allele frequencies will differ.

This becomes evident in the imprinting model due to breeding values have to been calulated for males and females, and this problem is disclose. But I think this problem goes beyond AlphaSimR implementation, and imprinting model, as has been developed, will not have more consequences than the need of explaining the differences betweeen males and females to the user to be sure it is being used in the correct way.

david20011999 commented 11 months ago

This is an exciting topic and I will be happy to contribute to AlphaSimR as much as possible! :)

gregorgorjanc commented 11 months ago

@david20011999 I will summarise here yesterday's feedback from @gaynorr. @gaynorr jump in if I missed/miss-understood anything!

alpha + i and alpha - i seem to be correct alphas only when we have random mating (=HWE). Hence, we need to modify calcGenParam() C++ code to work with regressions so we orthogonalise bv, dd, and id. @gaynorr shared how to do that in an R script. We will need to cater for AI and ADI cases, and, of course, later AIE and ADIE cases, as well as GxE, but let's do one step at a time!

[ ] Modify calcGenParam() C++ code to work with regressions so we orthogonalise bv, dd, and id
[ ] Check getGv() C++ code if it needs any tweaks too!

Similar to the above, we must be careful about breeding values and bv() function. This is a tricky area with lots of confusion in the literature (and my head!). It is highly related to the notion of random mating. I think the discussion was going towards using the above changes in calcGenParam() with orthogonal regressions, which will sort out phenotype simulation with imprinting AND provide orthogonal bv(), dd(), and id() (for AI, AD, ADI cases, ...). These values/deviations will be for the realised/current population at hand. However, if I fully followed @gaynorr, this breeding value is not actually the usually assumed breeding value - we need to explore this a bit more - at least I need more teaching from @gaynorr - my understanding is that this is all related to:

A note on Fisher's ‘average effect’ and ‘average excess’ https://doi.org/10.1017/S0016672300022825
The general relationship between average effect and average excess https://doi.org/10.1017/S0016672300026756
The causal meaning of Fisher's average effect https://doi.org/10.1017/S0016672313000074
Clarifying the Relationship between Average Excesses and Average Effects of Allele Substitutions https://doi.org/10.3389%2Ffgene.2012.00030

I think the above means that we should not be returning bv() for males and females as we do now (and alpha for males and females). All this should be handled later on - this is actually part of a broader challenge with breeding values (see above literature). If we do this properly, then we will also address the male-vs-female-frequency-side-of-things. The discussion about all this was revolving around hypothetical/future population under random mating and functionality that at the moment is not part of AlphaSimR, but in external scripts ... I have not internalised all this yet ...

In terms of actions:

[ ] Let's row-back on calculating maternal and paternal alpha and breeding values, but save that code somewhere (it might come useful later)
[ ] Let's study the above literature (@gaynorr any suggestion/nudges would be most appreciated)

The above points will give you phenotype simulation with imprinting and ability to work on models that estimate various kinds of breeding values to later do selection. If we don't bother with accuracy for breeding values (but maybe instead genetic values) then you actually have everything you need, but we should look into this later on as the above work evolves.

gaynorr commented 11 months ago

@david20011999 and @gregorgorjanc,

We probably don't need to worry about modeling traits with epistasis at the moment. That will make the task of changing genParam easier, because traits with epistasis have their own function for doing the calculations that consider interacting loci simultaneously.

"A note on Fisher's ‘average effect’ and ‘average excess’" is a great paper to read. The most important point made at the end.

However, Price (1972, p. 138) says of the Fundamental Theorem that' the derivation can be accomplished far more simply if we work entirely with regression coefficients and covariances without using Fisher's special "average excess" and "average effect" variables'.

For calculation of expected future performance, you'll want to derive an appropriate formula for the prediction of the mean of F1 progeny. You can find a version of this formula in Falconer and Mackay that deals with additive and dominance effects and considers a cross between two populations. It can also be used to predict a cross between two individuals, like we did in this paper: https://link.springer.com/article/10.1007/s00122-023-04300-6. You'll need to use your formula to predict crosses between a single individual and a population of potential mates to get what I'll call the true merit of that individual. This is what you'll instead of the male/female breeding values discussed before.

I recommend keeping the prediction of the F1 progeny outside of AlphaSimR, because it doesn't easily generalize to polyploids. I've produced a function for predicting the mean of F1 progeny in tetraploids and hexaploids on an individual level: https://github.com/gaynorr/QuantGenResources/blob/main/CalcCrossMeans.cpp. It was used in this paper: https://link.springer.com/article/10.1007/s00122-023-04377-z. However, it assumes exclusive bivalent pairing so it doesn't work exactly in AlphaSimR simulations that allow quadrivalent pairing (probably still a good approximation though).

david20011999 commented 11 months ago

Following @gaynorr instructions I've derived GV F1 expectation including imprinting and the result for a single locus, if my algebra is not wrong, is MF1 = a(p-q-y) + d[2pq + y(p-q)] -iy. Being the last term the difference from the Falconer's proposal. It makes sense because under the orthogonal model each term is independent of the others. Furthermore, this imprinting effect depends exclusivelly of the allelic frequency difference between both populations, y.

About the regression coefficient, I've been reading the paper "A note on Fisher's ‘average effect’ and ‘average excess’" and I can follow better @gaynorr 's argumentation. I have been doing some trials with AlphaSimR using a population with a big difference in the number of males and females (therefore also in their allelic frequencies) and comparing the progeny obtained from a specific genomic dossage and sex with the obtained under random mating.

Tomorrow @gregorgorjanc and me will be working on this project. We will keep you informed, @gaynorr .

gaynorr commented 11 months ago

@david20011999, you might want to consider a couple of additional derivations. I haven't checked your above derivation, but I suspect it is correct for a case where both populations are used as both dams and sires. You also want derivations for when one population is used as dams with the other as sires and vice versa. In this case, the direction of the mating will matter.

gaynorr / AlphaSimR

Added imprinting trait and functionality #167