gaynorr / AlphaSimR

R package for breeding program simulations
https://gaynorr.github.io/AlphaSimR/
Other
40 stars 17 forks source link

Added imprinting trait and functionality #167

Closed gregorgorjanc closed 7 months ago

gregorgorjanc commented 7 months ago

@david20011999 and I have bit the bullet and added imprinting to AlphaSimR. This involved quite a bit of work, but it's quite a straightforward extension of the current logic since we used the orthogonal imprinting model from https://academic.oup.com/genetics/article/211/1/75/5931118 (imprinting effect is orthogonal to additive and dominance effects). While we have done a lot, there is more needed, but before we go too much down the road, we would like to hear feedback from you @gaynorr!

What we did is:

Outstanding things are:

gaynorr commented 7 months ago

I'm going to toss this in a separate branch from devel at the moment to get a chance to get a really good look at it (like with altAddTraitAD). This is a really interesting thing to look at and add to AlphaSimR, but there are a lot of consequences for making this change that need to be explored.

The tricky part is going to be dealing with the variance components. Separating out breeding values for males and females gets particularly tricky, because it requires defining new reference populations. The paper you referenced doesn't deal with this challenge, because they are assuming HWE and equal allele frequencies for each sex. I'm also not sure that it is correct to think of an individual as having a male or female breeding value. Rather, I think it is their gametes that have these breeding values and their total breeding value is the some of their gamete's breeding values. However, I might be missing another way of thinking of this problem.

gregorgorjanc commented 7 months ago

Thanks for the fast response @gaynorr!

Indeed, we discussed with @david20011999 regarding HWE and also potential differences in frequencies between sexes. It gets hairy quickly, so at the moment we tried following your existing code to make changes digestible;) Wr can always expand later, if needed.

As to breeding value definition with imprinting, we are on the same page. The model we follow is actually providing standard breeding value formulation, which is a mean of maternal and paternal breeding values. So, we were able to use either depending on the context - see bv() changes.

It would be great to get feedback from you on the current implementation. We are keen to expand as needed.

david20011999 commented 7 months ago

Thanks for your reply @gaynorr!

I can follow your comments but I think they are going to the next step. As @gregorgorjanc said, I have been starting a research point about this topic and I will have to work on this in the future (it will be essential to my PhD). But, in my opinion, this paper is just adding a imprinting deviation to the standard genetic model, that differs between males and females (or acting as males or females in the case of hermaphrodite individuals). As example, we can observe that the alpha for the whole population and breeding values is the same alpha that Falconer defined. And of course, under this model, two different breeding values for males and females are needed: an A1A1 individual will produce A1A1 and A1A2 or A2A1 individuals, depending on the sex, but these heterozygous have different genetic values, then their progeny will differ.

The elephant in the room is Standard genetic model has problems when it is not under HWE conditions, and animal breeding has multiple variations of that conditions. Even without imprinting, the progeny produced of an individual (that the average value will be half of breeding value as definition) will depend directly of the alleles frequencies of the other sex, and this has not been taken into account until now (at least, as far as I know). In big populations with similar number of males and females we will expect similar alues and it will not have consequences, but in the case of reduced number of males, that is the most frequent situation in animal breeding, just for sampling allele frequencies will differ.

This becomes evident in the imprinting model due to breeding values have to been calulated for males and females, and this problem is disclose. But I think this problem goes beyond AlphaSimR implementation, and imprinting model, as has been developed, will not have more consequences than the need of explaining the differences betweeen males and females to the user to be sure it is being used in the correct way.

david20011999 commented 7 months ago

This is an exciting topic and I will be happy to contribute to AlphaSimR as much as possible! :)

gregorgorjanc commented 7 months ago

@david20011999 I will summarise here yesterday's feedback from @gaynorr. @gaynorr jump in if I missed/miss-understood anything!

  1. alpha + i and alpha - i seem to be correct alphas only when we have random mating (=HWE). Hence, we need to modify calcGenParam() C++ code to work with regressions so we orthogonalise bv, dd, and id. @gaynorr shared how to do that in an R script. We will need to cater for AI and ADI cases, and, of course, later AIE and ADIE cases, as well as GxE, but let's do one step at a time!
  1. Similar to the above, we must be careful about breeding values and bv() function. This is a tricky area with lots of confusion in the literature (and my head!). It is highly related to the notion of random mating. I think the discussion was going towards using the above changes in calcGenParam() with orthogonal regressions, which will sort out phenotype simulation with imprinting AND provide orthogonal bv(), dd(), and id() (for AI, AD, ADI cases, ...). These values/deviations will be for the realised/current population at hand. However, if I fully followed @gaynorr, this breeding value is not actually the usually assumed breeding value - we need to explore this a bit more - at least I need more teaching from @gaynorr - my understanding is that this is all related to:

I think the above means that we should not be returning bv() for males and females as we do now (and alpha for males and females). All this should be handled later on - this is actually part of a broader challenge with breeding values (see above literature). If we do this properly, then we will also address the male-vs-female-frequency-side-of-things. The discussion about all this was revolving around hypothetical/future population under random mating and functionality that at the moment is not part of AlphaSimR, but in external scripts ... I have not internalised all this yet ...

In terms of actions:

The above points will give you phenotype simulation with imprinting and ability to work on models that estimate various kinds of breeding values to later do selection. If we don't bother with accuracy for breeding values (but maybe instead genetic values) then you actually have everything you need, but we should look into this later on as the above work evolves.

gaynorr commented 7 months ago

@david20011999 and @gregorgorjanc,

We probably don't need to worry about modeling traits with epistasis at the moment. That will make the task of changing genParam easier, because traits with epistasis have their own function for doing the calculations that consider interacting loci simultaneously.

"A note on Fisher's ‘average effect’ and ‘average excess’" is a great paper to read. The most important point made at the end.

However, Price (1972, p. 138) says of the Fundamental Theorem that' the derivation can be accomplished far more simply if we work entirely with regression coefficients and covariances without using Fisher's special "average excess" and "average effect" variables'.

For calculation of expected future performance, you'll want to derive an appropriate formula for the prediction of the mean of F1 progeny. You can find a version of this formula in Falconer and Mackay that deals with additive and dominance effects and considers a cross between two populations. It can also be used to predict a cross between two individuals, like we did in this paper: https://link.springer.com/article/10.1007/s00122-023-04300-6. You'll need to use your formula to predict crosses between a single individual and a population of potential mates to get what I'll call the true merit of that individual. This is what you'll instead of the male/female breeding values discussed before.

I recommend keeping the prediction of the F1 progeny outside of AlphaSimR, because it doesn't easily generalize to polyploids. I've produced a function for predicting the mean of F1 progeny in tetraploids and hexaploids on an individual level: https://github.com/gaynorr/QuantGenResources/blob/main/CalcCrossMeans.cpp. It was used in this paper: https://link.springer.com/article/10.1007/s00122-023-04377-z. However, it assumes exclusive bivalent pairing so it doesn't work exactly in AlphaSimR simulations that allow quadrivalent pairing (probably still a good approximation though).

david20011999 commented 7 months ago

Following @gaynorr instructions I've derived GV F1 expectation including imprinting and the result for a single locus, if my algebra is not wrong, is MF1 = a(p-q-y) + d[2pq + y(p-q)] -iy. Being the last term the difference from the Falconer's proposal. It makes sense because under the orthogonal model each term is independent of the others. Furthermore, this imprinting effect depends exclusivelly of the allelic frequency difference between both populations, y.

About the regression coefficient, I've been reading the paper "A note on Fisher's ‘average effect’ and ‘average excess’" and I can follow better @gaynorr 's argumentation. I have been doing some trials with AlphaSimR using a population with a big difference in the number of males and females (therefore also in their allelic frequencies) and comparing the progeny obtained from a specific genomic dossage and sex with the obtained under random mating.

Tomorrow @gregorgorjanc and me will be working on this project. We will keep you informed, @gaynorr .

gaynorr commented 7 months ago

@david20011999, you might want to consider a couple of additional derivations. I haven't checked your above derivation, but I suspect it is correct for a case where both populations are used as both dams and sires. You also want derivations for when one population is used as dams with the other as sires and vice versa. In this case, the direction of the mating will matter.