gaynorr / AlphaSimR

R package for breeding program simulations
https://gaynorr.github.io/AlphaSimR/
Other
42 stars 18 forks source link

Pedigree after calling reduceGenome() or doubleGenome() is identical with keepParents = TRUE or keepParents = FALSE #90

Closed philipbg closed 1 year ago

philipbg commented 1 year ago

Describe the bug I am using reduceGenome() and doubleGenome() to make gametes and cross them instead of crossing individuals. I would expect that the keepParents argument of each of these functions would modify which mother and which father are recorded in the pedigree (i.e., SimParam$pedigree) after implementing these functions. However, the pedigree is the same regardless of whether this argument is set to TRUE or FALSE.

Steps To Reproduce

> ## ******** With keepParents = TRUE ********
> rm(list = ls())
> library(AlphaSimR)
> 
> founderGenomes <- runMacs(
+   nInd = 5,
+   nChr = 3,
+   segSites = 100,
+   species = "MAIZE"
+ )
> 
> SP <- SimParam$new(founderGenomes)
> SP$setTrackPed(isTrackPed = TRUE)
> 
> basePop <- newPop(founderGenomes)
> 
> gametes <- reduceGenome(pop = basePop, nProgeny = 2, keepParents = TRUE)
> 
> SP$pedigree
   mother father isDH
1       0      0    0
2       0      0    0
3       0      0    0
4       0      0    0
5       0      0    0
6       1      1    0
7       1      1    0
8       2      2    0
9       2      2    0
10      3      3    0
11      3      3    0
12      4      4    0
13      4      4    0
14      5      5    0
15      5      5    0
> 
> doubleGamete <- doubleGenome(gametes, keepParents = TRUE)
> 
> SP$pedigree
   mother father isDH
1       0      0    0
2       0      0    0
3       0      0    0
4       0      0    0
5       0      0    0
6       1      1    0
7       1      1    0
8       2      2    0
9       2      2    0
10      3      3    0
11      3      3    0
12      4      4    0
13      4      4    0
14      5      5    0
15      5      5    0
16      6      6    0
17      7      7    0
18      8      8    0
19      9      9    0
20     10     10    0
21     11     11    0
22     12     12    0
23     13     13    0
24     14     14    0
25     15     15    0
> 
> newPop <- randCross(doubleGamete,nCrosses=5)
> 
> SP$pedigree
   mother father isDH
1       0      0    0
2       0      0    0
3       0      0    0
4       0      0    0
5       0      0    0
6       1      1    0
7       1      1    0
8       2      2    0
9       2      2    0
10      3      3    0
11      3      3    0
12      4      4    0
13      4      4    0
14      5      5    0
15      5      5    0
16      6      6    0
17      7      7    0
18      8      8    0
19      9      9    0
20     10     10    0
21     11     11    0
22     12     12    0
23     13     13    0
24     14     14    0
25     15     15    0
26     17     24    0
27     18     22    0
28     18     25    0
29     21     25    0
30     22     25    0
> 
> ## ******** With keepParents = FALSE ********
> 
> rm(list = ls())
> library(AlphaSimR)
> 
> founderGenomes <- runMacs(
+   nInd = 5,
+   nChr = 3,
+   segSites = 100,
+   species = "MAIZE"
+ )
> 
> SP <- SimParam$new(founderGenomes)
> SP$setTrackPed(isTrackPed = TRUE)
> 
> basePop <- newPop(founderGenomes)
> 
> gametes <- reduceGenome(pop = basePop, nProgeny = 2, keepParents = FALSE)
> 
> SP$pedigree
   mother father isDH
1       0      0    0
2       0      0    0
3       0      0    0
4       0      0    0
5       0      0    0
6       1      1    0
7       1      1    0
8       2      2    0
9       2      2    0
10      3      3    0
11      3      3    0
12      4      4    0
13      4      4    0
14      5      5    0
15      5      5    0
> 
> doubleGamete <- doubleGenome(gametes, keepParents = FALSE)
> 
> SP$pedigree
   mother father isDH
1       0      0    0
2       0      0    0
3       0      0    0
4       0      0    0
5       0      0    0
6       1      1    0
7       1      1    0
8       2      2    0
9       2      2    0
10      3      3    0
11      3      3    0
12      4      4    0
13      4      4    0
14      5      5    0
15      5      5    0
16      6      6    0
17      7      7    0
18      8      8    0
19      9      9    0
20     10     10    0
21     11     11    0
22     12     12    0
23     13     13    0
24     14     14    0
25     15     15    0
> 
> newPop <- randCross(doubleGamete,nCrosses=5)
> 
> SP$pedigree
   mother father isDH
1       0      0    0
2       0      0    0
3       0      0    0
4       0      0    0
5       0      0    0
6       1      1    0
7       1      1    0
8       2      2    0
9       2      2    0
10      3      3    0
11      3      3    0
12      4      4    0
13      4      4    0
14      5      5    0
15      5      5    0
16      6      6    0
17      7      7    0
18      8      8    0
19      9      9    0
20     10     10    0
21     11     11    0
22     12     12    0
23     13     13    0
24     14     14    0
25     15     15    0
26     16     17    0
27     16     23    0
28     21     25    0
29     22     23    0
30     24     25    0

Expected behavior I would expect that the keepParents argument would influence which parents are shown in the pedigree. I want to be able to recover a pedigree that shows the parents of the final cross (the last 5 lines of the pedigree; made by randCross()) to be individuals from basePop, which would mean that the numbers for the mother and father in the last 5 lines in this example should be between 1 and 5 since the basePop had 5 individuals. They are instead between 16 and 25 corresponding to the doubleGamete population.

gaynorr commented 1 year ago

That's actually intended. A better description of the keepParents argument is probably needed. The keepParents argument only pertains to the mother and father slots in the population and not the pedigree stored in SimParam. This is due to each being used differently in the software.

The pedigree in SimParam is used to for haplotype tracking and needs to be more rigidly structured to avoid breaking this code. It needs to keep track of each unique genetic entity in the simulation.

The pedigree shown in the population is used for selection (e.g. selectWithinFam). You can freely change the mother and father slots in a population without any effect on the haplotype tracking code. What it does is change how the selection functions view families and thereby how selection is performed. This exists to model how plant breeders think of families when talking about inbred lines.