Make drone quantitative values truly haploid?

HighlanderLab / SIMplyBee

SIMplyBee R package extends AlphaSimR for simulating honeybee populations and breeding programmes

http://www.simplybee.info/

Other

0 stars 5 forks source link

Make drone quantitative values truly haploid? #519

Open gregorgorjanc opened 1 year ago

gregorgorjanc commented 1 year ago

Silly question, but I am travelling and can’t easily access SIMplyBee code - how do we deal with drone quantitative values - do we take 0.5 of each value doubled-haploid (hence diploid fully inbred) or we forgot to do this?

janaobsteter commented 1 year ago

I checked the createDrones and getGv functions and we don't do that in any of them, so I'm guessing we forgot. This should probably be implemented when we create drones, right?

gregorgorjanc commented 1 year ago

Correct, IF we wanted to do anything, then the best place would be in createDrones(). I guess, we can just go and divide by 2 and we will get the haploid value.

BUT, should we really do this? There are points for this change and against it:

PRO
- drones should have haploid values, not diploid values that they currently have (a note, regarding inheritance everything is fine - drones will pass only a haploid gamete to their progeny as in reality)
- is there any other PRO point?
- if we do this we need to:
- [ ] modify createDrones() such that gv are divided by 2
- [ ] modify genParam() (many variance and quantitative genetics functions use genParam() under the hood!) since that function works with underlying QTL genotypes - there will be a lot of work there (in R and C++!) - maybe we should do something on the SimParam side, say in defining traits addTrait*()?
- [ ] anything else?
CON
- is there dosage compensation in drones as is the case with sex chromosome for some traits in other species?
- modifying genParam() (many variance and quantitative genetics functions use genParam() under the hood!) will be challenging
- is there any other CON point?

A strategy working with genParam() would be to talk to Chris and see if we perhaps can pass an argument to it to take policy into account - will ask Chris about his thoughts on this.

We should also think about environmental variances! Is environmental variance in drones the same as in workers and the queen? I think we don't know, also, traits will likely differ substantially between these three caste, sometimes they are only expressed in one caste and and not another!

Should we maybe have some information/parameter in SimParam and SimParamBee? The question behind this issue arose in my head while I was reading the SLiM manual (https://messerlab.org/slim, https://github.com/MesserLab/SLiM/releases/download/v4.0.1/SLiM_Manual.pdf) and how it handles sex chromosome and fitness calculation for XX females and XY males (SLiM has a parameter directive (like SimParam) to manage this).

I think all these questions opened a can of worms!? Lot's of unknowns here.

The key point and question for now is that we should discuss what to do in the intermediate period before we do the change/fix - because we are currently wrongly "hidding" this behaviour. Let's talk in one of the meetings on what to do or shoot ideas here!

gaynorr commented 10 months ago

Sneaking on to your issue page here, because Gregor mentioned how useful you guys find it and I noticed this thread.

I should mention that due to the way the AlphaSimR handles different levels of ploidy, there won't be any difference in genetic values or variances between a haploid individual or a fully inbred diploid individual with the same alleles (or any higher level of ploidy). Under the hood, AlphaSimR is working with relative dosage for genetic effects and not raw dosage. For example, a homozygous genotype will always have a or -a for its additive effect. This means that dividing the genetic value by 2 is not correct for AlphaSimR.

Here is an example:

library(AlphaSimR)

founderPop = quickHaplo(100, 10, 100, inbred=TRUE)

SP = SimParam$
  new(founderPop)$
  addTraitADE(100, meanDD=0.2, relAA=0.1)

diploid = newPop(founderPop)
haploid = reduceGenome(diploid)

varA(diploid) # 1
varA(haploid) # 1

varD(diploid) # 0
varD(haploid) # 0

varAA(diploid) # ~0.2 (2*0.1 due to being fully inbred)
varAA(haploid) # ~0.2 (2*0.1 due to being fully inbred)

mean(gv(diploid) - gv(haploid)) # 0

gregorgorjanc commented 10 months ago

@gaynorr thanks for coming here!

The issue is that we are simulating drones as doubled haploid (—> diploid) individuals! We did this so that we could easily mate a queen (diploid) with these drones. We initially tried implementing them as haploid (we still have that code in the inactive background), but some downstream functionality became challenging - see https://github.com/HighlanderLab/SIMplyBee/issues/24. Any advice on this would be most welcome!

gaynorr commented 10 months ago

Yeah, the genetic value of those diploids will be the same as if you were doing haploids instead so my point was that there is no concern there.

If you really want haploids, you can do it with some combination of reduceGenome, doubleGenome, and/or mergeGenome without having to touch c++ functions. Below is an example of these functions.

The catch with this approach is that AlphaSimR will create a new individual with each of these steps and thus a new pedigree entry. That means the internal AlphaSimR pedigree won't look like an actual bee pedigree. You'll get "individuals" that are really just gametes. Everything else in the package should work as expected. Of course the pedigrees aren't correct for bees with the DH trick either, because the actual pedigrees just track queens and ignore drones.

If it were me, I'd probably focus on processing the pedigree. Whether you want diploid or haploid drones is mainly just an aesthetic choice.

If you want something that is truly faithful to what is happening. You could use reduceGenome to create drones and create your own function for mating. The function for mating can be based off of mergeGenome. The main difference is that it'd need to sample gametes from the females before merging. Sampling gametes from the females can be done by calling the c++ function createReducedGenome. This is the function used by reduceGenome to perform this task. You'd want to call the c++ function here instead of using reduceGenome, because it will give you control over how the pedigree is handled.

library(AlphaSimR)

founderPop = quickHaplo(2,1,10,inbred=TRUE)

SP = SimParam$new(founderPop)$setTrackRec(TRUE)

diploid = newPop(founderPop)
haploid = reduceGenome(diploid)
newDiploid = doubleGenome(haploid)
mergedDiploid = mergeGenome(females=haploid, 
                            males=haploid, 
                            cbind(1:nInd(haploid), 
                                  1:nInd(haploid)))

SP$pedigree
pullIbdHaplo(diploid)
pullIbdHaplo(haploid)
pullIbdHaplo(newDiploid)
pullIbdHaplo(mergedDiploid)

gregorgorjanc commented 10 months ago

@gaynorr hah, yeah this code of yours works like a charm. I will have to go back into https://github.com/HighlanderLab/SIMplyBee/issues/24 to understand what was bugging me at that time. Maybe the new pullIbdHaplo() background implementation now works with variable ploidy, while before it worked only with even-ploidy.

Of course the pedigrees aren't correct for bees with the DH trick either, because the actual pedigrees just track queens and ignore drones.

Hmm, why do you say this? Drones come from the queen only, which is what AlphaSimR does track in the following way

library(AlphaSimR)

founderPop = quickHaplo(2,1,10)

SP = SimParam$new(founderPop)$setTrackPed(TRUE)

queens = newPop(founderPop)
SP$pedigree
DH = makeDH(queens)
SP$pedigree

where the last output is

  mother father isDH
1      0      0    0
2      0      0    0
3      1      1    1
4      2      2    1

Aha, you mean that father should be 0 for the DH, right?

janaobsteter commented 10 months ago

@gaynorr we've been testing this code and how it reflect on the genetic values and variances. We noticed that the variance of haploids and double haploids is twice the variance of the diploids - see the code below. Is this ok?

library(AlphaSimR)

founderPop = quickHaplo(nInd= 1000,1,10)

SP = SimParam$new(founderPop)$setTrackRec(TRUE)
SP$addTraitA(nQtlPerChr = 10, mean = 0, var = 1)

diploid = newPop(founderPop)
var(diploid@gv)
nrow(diploid@gv)

diploid2 <- randCross(diploid, nCrosses = 1000)
var(diploid2@gv)

haploid = reduceGenome(diploid)
var(haploid@gv)
nrow(haploid@gv)

newDiploid = doubleGenome(haploid)
var(newDiploid@gv)

mergedDiploid = mergeGenome(females=haploid, 
                            males=haploid, 
                            cbind(1:nInd(haploid), 
                                  1:nInd(haploid)))
var(mergedDiploid@gv)

gaynorr commented 10 months ago

@janaobsteter, that would be correct because the initial population is roughly in Hardy-Weinberg equilibrium and the later are fully inbred (e.g. Va = 2(1+F)pqa). If you add inbred=TRUE to your quickHaplo function, they will approximately match up.

janaobsteter commented 9 months ago

@gregorgorjanc , adding the figures Wheninbred = FALSE

When inbred = TRUE

gregorgorjanc commented 9 months ago

@gaynorr above are plots for genetic values of diploids and haploids generated from the diploids using reduceGenome (left column) alongside sum of allele dosages (right column). @janaobsteter shows this for the case where diploids are not inbred (top 2x2 plot) and when they are fully inbred (bottom 2x2 plot).

These plots are generated using the same code as above in the thread - @janaobsteter maybe you add the code to the post for reproducibility.

In the inbred case (bottom) the variance of genetic values in inbred diploids and haploids is the same (about 1). I find this somewhat confusing but I am struggling to put my confusion to words! Let's see. Diploid inbred var is 2pq\alpha(1+F) --> 1 in our case, but for haploids we should have pq\alpha, right, though this value would be 1/2 less going from diploid to haploid AND we don't have inbreeding anymore so another 1/2 reduction leading to 1/4, yet the value we have is actually ~1. Is this all due to the used scaling across ploidy levels in AlphaSimR (Table 1 in https://cran.r-project.org/web/packages/AlphaSimR/vignettes/traits.pdf)? Maybe we should add a haploid column to that table too!? I am just likely daft what is going on and would appreciate if you can explain to us what is going on.

In the non-inbred case (top) the variance of genetic values in non-inbred diploids is about 1/2 and haploids is about 1. While I struggled in the above inbred case, here I am really confused.

gaynorr commented 9 months ago

@gregorgorjanc and @janaobsteter, the scaling of genotype dosage is causing these patterns. In diploids, a equals the additive genetic effect for changing one allele from the "0" to the "1" allele. This is a change of half the total dosage. In haploids, the effect of change one allele from the "0" to "1" equals 2a, because this is a change of the full dosage or twice the change you see in diploids.

I think the system in AlphaSimR is easiest to think about when considering increasing the ploidy level. In AlphaSimR, if you double an individual's chromosomes their genetic values remain unchanged. This is because the relative dosages are unchanged. A population having its chromosomes doubled will see no change in its genetic variances.

An alternative way to handle ploidy would be to use absolute dosage instead of relative dosage. Meaning, you consider a to represent the effect of a change of one "0" allele to a "1" allele. This is the more common formulation used in polyploid literature. This formulation would see genetic values change when you doubled the chromosomes. The direction and size of the change isn't the same for all individuals. However, at a population level the change will result in an increase in genetic variance by a factor of 4.

Neither formulations is biologically sound, because changing ploidy isn't going to be the same for all traits and species. I'm using the AlphaSimR formulation because I thought it made dealing with dominance degrees more intuitive. I also expect there could be some cases where it is approximately biologically correct. However, I would in general caution against comparing individuals with different ploidy levels.