gaynorr / AlphaSimR

R package for breeding program simulations
https://gaynorr.github.io/AlphaSimR/
Other
43 stars 18 forks source link

SP$recHist #34

Closed joetabet closed 2 years ago

joetabet commented 2 years ago

Hello, I am using the function SimParam$setTrackRec() and SP$recHist. I am wondering is there some literature about the output of the function, to better understand my results. Thanks !

gregorgorjanc commented 2 years ago

Hi @joetabet,

Maybe post an reproducible example and we can turn that into an improved documentation as we explain it to you here?

joetabet commented 2 years ago

Yes of course, this is a very short example ,

`FounderGenomes = quickHaplo(nInd = 1000,nChr = 2,segSites = 500 ) SP = SimParam$new(FounderGenomes) SP$addTraitA(nQtlPerChr = 50, mean = 0, var = 2) SP$setSexes("yes_sys")

SP$setTrackRec(TRUE) Pop = newPop(FounderGenomes) Pop1 = randCross(pop = Pop, nCrosses = 100, nProgeny = 2)` SP$recHist

Thank you for your time

gregorgorjanc commented 2 years ago

@joetabet I simplified the example to this

> FounderGenomes = quickHaplo(nInd = 4, nChr = 3, segSites = 10)
> SP = SimParam$new(FounderGenomes)
> SP$setSexes("yes_sys")
> 
> SP$setTrackRec(TRUE)
> Pop = newPop(FounderGenomes)
> Pop1 = randCross(pop = Pop, nCrosses = 2, nProgeny = 3)
> SP$pedigree
   mother father isDH
1       0      0    0
2       0      0    0
3       0      0    0
4       0      0    0
5       4      1    0
6       4      1    0
7       4      1    0
8       2      3    0
9       2      3    0
10      2      3    0

So, we have 4 founders and I make two crosses (4 with 1 and 2 with 3) each producing 3 progeny, so we then have 6 progeny total in Pop

> SP$recHist
[[1]]
[1] 1 2

[[2]]
[1] 3 4

[[3]]
[1] 5 6

[[4]]
[1] 7 8

[[5]]
     [,1]  
[1,] list,2
[2,] list,2
[3,] list,2

[[6]]
     [,1]  
[1,] list,2
[2,] list,2
[3,] list,2

[[7]]
     [,1]  
[1,] list,2
[2,] list,2
[3,] list,2

[[8]]
     [,1]  
[1,] list,2
[2,] list,2
[3,] list,2

[[9]]
     [,1]  
[1,] list,2
[2,] list,2
[3,] list,2

[[10]]
     [,1]  
[1,] list,2
[2,] list,2
[3,] list,2

SP$recHist tracks all recombinations across all individuals ever created. So, 10 in our example.

> is(SP$recHist)
[1] "list"       "vector"     "listOrNULL"

SP$recHist is a list so lets dive into individual 5

> SP$recHist[[5]]
     [,1]  
[1,] list,2
[2,] list,2
[3,] list,2

There is one list node for every chromosome we simulated (3 in the example). Let's loot into the 1st chromosome.

> SP$recHist[[5]][[1]]
     [,1]     
[1,] integer,4
[2,] integer,6

OK, this is diploid situation, so this individual has two chromosome copies. Let's look at the the 1st copy:

> SP$recHist[[5]][[1]][[1]]
     [,1] [,2]
[1,]    2    1
[2,]    1   10

Now, this is where I am blanking out now. I think that every row is a recombination. Think of going from left-to-right in parental chromosomes and generating progeny chromosome. The first row indicates which parental chromosome should we start with (the 1st column - and since it's a start the value in the 2nd column is 1), then at position 10 there was a recombination to the other parental chromosome.

Similarly for the other copy:

> SP$recHist[[5]][[1]][[2]]
     [,1] [,2]
[1,]    2    1
[2,]    1    2
[3,]    2    6

You might think that the start seems to always be 2nd parental chromosome, but looking at other progeny you will see that this is not the case

> SP$recHist[[6]][[1]][[1]]
     [,1] [,2]
[1,]    2    1
[2,]    1    7
> SP$recHist[[6]][[1]][[2]]
     [,1] [,2]
[1,]    2    1
[2,]    1    6
> SP$recHist[[7]][[1]][[1]]
     [,1] [,2]
[1,]    1    1
> SP$recHist[[7]][[1]][[2]]
     [,1] [,2]
[1,]    2    1
[2,]    1    5
> SP$recHist[[8]][[1]][[1]]
     [,1] [,2]
[1,]    2    1
> SP$recHist[[8]][[1]][[2]]
     [,1] [,2]
[1,]    1    1
> SP$recHist[[9]][[1]][[1]]
     [,1] [,2]
[1,]    1    1
> SP$recHist[[9]][[1]][[2]]
     [,1] [,2]
[1,]    2    1
> SP$recHist[[10]][[1]][[1]]
     [,1] [,2]
[1,]    2    1
> SP$recHist[[10]][[1]][[2]]
     [,1] [,2]
[1,]    1    1
[2,]    2    5
[3,]    1    7

@gaynorr did I miss anything? Let me know so I can correct and prepare a pull request with a doc update for this bit.

joetabet commented 2 years ago

@gregorgorjanc Thank you so much for the clarification. So basically what I understood was ,

SP$recHist[[5]][[1]][[2]] [,1] [,2] [1,] 2 1 [2,] 1 2 [3,] 2 6

[[5]] individual [[1]] first chromosome [[2]] second copy of the first chromosome.

Now here is where things get a bit confusing, basically we are looking at segsites that are in recombination? This means that the first column [,1] is from the sire and [,2] dams (parenteral) and that for the second copy of the chromosome site#2 from father and site #1 from mother were linked, and so one site crossed over and so on? for second-row site #1 from father and site #2 from mother are together? Or am I missing something? I am sorry for all the trouble, but is there a way of calculating the rate? And also, when I add a trait with QTL per chromosome, how can I see study the recombination then? Thank you

gregorgorjanc commented 2 years ago

@joetabet the above rows show ORIGIN of chromosome segments (paternal or maternal chromosome of the second chromosome copy - of the first chromosome pair - in individual 5) and their START position (hence the end position is start - 1).

We read the above as:

  1. The chromosome originated from the second copy of the parent from position 1 onwards
  2. Then we had a recombination at position 2 to the other chromosome of the parent (note that I have simulated very few segregating sites)
  3. Then we had another recombination at position 6, back to second copy of the parental chromosome

I don't understand what you mean by the rate. Recombination rate? We specify expected recombination rate in runMacs(). The default, I think!, is 1 Morgan chromosome, but check the documentation. Each chromosome will obviously have only few recombinations.

I am also not clear what do you mean by trait, QTL, and recombination. Can you rewrite the question? It looks though that your question is not directly related to the original question. Perhaps you should start a discussion topic on this site - not an issue;)

jamonterotena commented 2 years ago

Thanks for your quick answer Gregor.

SP$recHist[[5]][[1]][[2]] [,1] [,2] [1,] 2 1 [2,] 1 2 [3,] 2 6

So then in this example, [,1] = 1 means the segment was inherited from the mother's paternal chromosome and 2 means that it was taken from the mother's maternal chromosome?

gregorgorjanc commented 2 years ago

@jamonterotena [, 1] and [, 2] denotes columns of a matrix and [1,], [2, ], and [3,] denote rows of a matrix - this is how R prints out a matrix when there are no row and column names. To help you understand the behaviour, I invite you to print out haplotypes (pullSegSiteHaplo() and pullIbdHaplo()) so you can convince yourself what is going on.

jamonterotena commented 2 years ago

Followed your instructions. This is an example of the haplotype of an individual, 25.

IBD

Though not seen here, I checked that the sequence 25_1 descends from 25's mother (grandmother 4, grandfather 14) and 25_2 from its father (grandmother 15, grandfather 13).

Looked at the recHist list for 25. It is obvious that [[1]] accounts for the crossovers between the mother's chromosomes and [[2]] the same for the father. Just by comparing with the IBD state of the alleles in previous picture, this becomes obvious. Then, the rows with 1 in the first column account for segments inherited from the maternal chromosome of the respective parent.

recHist

However, then I remember that the function 'countRecomb' in the script 'simulate_chromosome.R' utilized [[1]] as the paternal side and [[2]] as the maternal side:

_recHist <- SP$recHist countRecomb <- function(index) { ind = recHist[[index]] if(!is.null(ind)) { pat = ind[[1]][[1]] values_pat = nrow(pat) - 1 mat = ind[[1]][[2]] values_mat = nrow(mat) - 1 values_total = values_pat + values_mat return(c(index, values_pat, values_mat, valuestotal)) } return(c(index, NA, NA, NA)) } tmp = lapply(201:length(recHist), countRecomb)

This is probably what confused me in the first place. Maybe I am missing something but could you check this out? Sometimes when coding it is inevitable to make this kind of mistakes.

gregorgorjanc commented 2 years ago

@jamonterotena where is the simulate_chromosome.R script?

jamonterotena commented 2 years ago

it's in the github repository mentioned in the article: https://github.com/gaynorr/AlphaSimR_Examples/blob/master/recombination_simulations/simulate_chromosome.r

This is how the function is actually written: _recHist <- SP$recHist tmp = lapply(1:length(recHist), function(index) { ind = recHist[[index]] if(!is.null(ind)) { pat = ind[[1]][[1]] values_pat = nrow(pat) - 1 mat = ind[[1]][[2]] values_mat = nrow(mat) - 1 values_total = values_pat + values_mat return(c(index, values_pat, values_mat, valuestotal)) } return(c(index, NA, NA, NA)) })

gregorgorjanc commented 2 years ago

Hmm, looks like a buglet to me, right @gaynorr ?

gaynorr commented 2 years ago

Yeah, it looks like that script has the paternal and maternal chromosomes flipped. It's always maternal chromosomes first in AlphaSimR.