gaynorr / AlphaSimR

R package for breeding program simulations
https://gaynorr.github.io/AlphaSimR/
Other
40 stars 17 forks source link

makeCross2 and ID problem #142

Closed AudreyAAMartin closed 1 year ago

AudreyAAMartin commented 1 year ago

Describe the bug The function makeCross2 (from crossing.R) creates new individuals based on a pop and a crossing plan. If the crossing plan (and the pop) has numerical ID that are different than 1:nInd, you get the error 'Invalid crossPlan'. The error is coming from the step checking that the crossPlan and pop information match (see below).

if((max(crossPlan[,1])>nInd(females)) | 
     (max(crossPlan[,2])>nInd(males)) |
     (min(crossPlan)<1L)){
    stop("Invalid crossPlan")
  }

Steps To Reproduce

library(AlphaSimR)
#Create founder haplotypes
founderPop = quickHaplo(nInd=4, nChr=1, segSites=10)
#Set simulation parameters
SP = SimParam$new(founderPop)
#Create population
pop = newPop(founderPop, simParam=SP)
#numerical ID that are bigger than the pop size (or simply != 1:nInd)
pop@id = c("10", "11", "12", "13", "14")
#Cross individual 13 with individual 14
crossPlan = c(13, 14)
pop2 = makeCross(pop, crossPlan, simParam=SP)

Expected behavior The function is creating new individuals based on a crossing plan. It should run fine no matter the 'value' of numerical ID.

gregorgorjanc commented 1 year ago

@AudreyAAMartin I think the issue is with the character ID's recently introduced. Would replacing

if((max(crossPlan[,1])>nInd(females)) | 
     (max(crossPlan[,2])>nInd(males)) |
     (min(crossPlan)<1L)){
    stop("Invalid crossPlan")
  }

with

if(any(!crossPlan[, 1] %in% females@id)) | 
     (any(!crossPlan[, 2] %in% males@id)) {
    stop("Invalid crossPlan")
  }

solve the problem? BUT, we need to ensure we are picking the right parents down the line in the code too!

AudreyAAMartin commented 1 year ago

@gregorgorjanc It is definitely linked to the ID problem! The replacement did solve that part of the problem but the function still does not function properly. I tracked it back to makeCross2() -> newPop() -> .newPop() -> setMethod for RawPop Extract RawPop by index where:

setMethod("[",
          signature(x = "RawPop"),
          function(x, i){
            if(any(abs(i)>x@nInd)){
              stop("Trying to select invalid individuals")
            }
            for(chr in 1:x@nChr){
              x@geno[[chr]] = x@geno[[chr]][,,i,drop=FALSE]
            }
            x@nInd = dim(x@geno[[1]])[3]
            return(x)
          }
)

(x is a pop object and i an individual) It is the if statement who's problematic but changing it will have consequences on more than just makeCross2()...

gregorgorjanc commented 1 year ago

@AudreyAAMartin thanks for this follow-up report!

@gaynorr I think we can handle this quite well by managing different class of i in the setMethod("[", ... through different signatures - we did this in SIMplyBee at https://github.com/HighlanderLab/SIMplyBee/blob/9e11745f30869e8176d164959cfa3bfcd4381cf7/R/Class-MultiColony.R#L208

Looking at this further, it seems we already can use integer or character subsetting a population:

library(AlphaSimR)
founderPop = quickHaplo(nInd = 4, nChr = 1, segSites = 10)
SP = SimParam$new(founderPop)
pop = newPop(founderPop)
pop@id
# [1] "1" "2" "3" "4"
pop[c(4, 2)]@id
# [1] "4" "2"
pop[c("4", "2")]@id
# [1] "4" "2"

the challenge becomes when we do

pop2 = randCross(pop, nCrosses = 5)
pop2@id
# [1] "5" "6" "7" "8" "9"
pop[c(5, 1)]@id
# Error in .local(x, i, ...) : Trying to select invalid individuals
pop[c("9", "5")]@id
# Error in .local(x, i, ...) : Trying to select invalid individuals

So, it seems we are casting character subset variable to integer/numeric, which kicks in the code that @AudreyAAMartin shows?

We are happy to work on a solution, but it would be good to agree on the approach so our changes are accepted.

gaynorr commented 1 year ago

@gregorgorjanc your code above is failing because you are selecting from pop and not pop2 as intended.

@AudreyAAMartin, as Gregor shows there is currently a dual behavior for either character variables or numeric variables when you call population subsetting or crossing. When you supply numeric values, AlphaSimR looks for individuals according to their position in the populations. If you provide character variables, AlphaSimR will look for individuals according to their ID. This second functionality was added later.

A couple of changes need to be made to the original code. First, there are 4 individuals in the initial population, but 5 new IDs are generated. I added an additional individual to the founder population to correct this. Then you just need to pass the IDs as characters and they need to be in a matrix with two columns. Here is my code below.

library(AlphaSimR)
#Create founder haplotypes
founderPop = quickHaplo(nInd=5, nChr=1, segSites=10)
#Set simulation parameters
SP = SimParam$new(founderPop)
#Create population
pop = newPop(founderPop, simParam=SP)
#numerical ID that are bigger than the pop size (or simply != 1:nInd)
pop@id = c("10", "11", "12", "13", "14")
#Cross individual 13 with individual 14
crossPlan = cbind("13", "14")
pop2 = makeCross(pop, crossPlan, simParam=SP)
gregorgorjanc commented 1 year ago

For completness - current character code fails

> pop2 = randCross(pop, nCrosses = 5)
> pop2@id
[1] "10" "11" "12" "13" "14"
> pop2[c(5, 1)]@id
[1] "14" "10"
> pop2[c("9", "5")]@id
Error in .local(x, i, ...) : Trying to select invalid individuals

Sorry, another typo! Multitasking

library(AlphaSimR)
founderPop = quickHaplo(nInd = 4, nChr = 1, segSites = 10)
SP = SimParam$new(founderPop)
pop = newPop(founderPop)
pop@id
pop[c(4, 2)]@id
pop[c("4", "2")]@id

pop2 = randCross(pop, nCrosses = 5)
pop2@id
pop2[c(5, 1)]@id
pop2[c("9", "5")]@id

gives

> pop2 = randCross(pop, nCrosses = 5)
> pop2@id
[1] "5" "6" "7" "8" "9"
> pop2[c(5, 1)]@id
[1] "9" "5"
> pop2[c("9", "5")]@id
[1] "9" "5"

So, all is fine and I think this issue should be closed.

gaynorr commented 1 year ago

Closing the issue.