gaynorr / AlphaSimR

R package for breeding program simulations
https://gaynorr.github.io/AlphaSimR/
Other
43 stars 18 forks source link

Trouble creating new population using custom genetic data #9

Closed juliendiot42 closed 2 years ago

juliendiot42 commented 3 years ago

Hello.

I am facing a problem when creating a MapPop population using my own genotypic data.

I have transformed my data to match the requirement in the documentation: (R/Class-Pop.R line 17-19)

geno: "matrix" containing chromosome genotypes. The "matrix" has dimensions nChr by 1 and each element is a three dimensional array of raw values. The array dimensions are nLoci by ploidy by nInd.

If I manually check I have indeed:

all.equal(dim(geno) , c(nChr, 1)) #  The "matrix" has dimensions nChr by 1
## [1] TRUE

and

all(sapply(1:nChr, function(chr) {
  all.equal(dim(geno[[chr]]), c(nLoci[chr], ploidy, nInd))} #The array dimensions are `nLoci` by `ploidy` by `nInd`
  ))
## [1] TRUE

However, the creation of the population object fails:

myPop <- new("MapPop",
             nInd = nInd,
             nChr = nChr,
             ploidy = 2L,
             nLoci = nLoci,
             geno = geno,
             genMap = genMap,
             centromere = centromere,
             inbred = TRUE)
Error in validObject(.Object) : 
  invalid class "MapPop" object: 1: nLoci[1]!=dim(geno[[1]][1]
invalid class "MapPop" object: 2: nLoci[2]!=dim(geno[[2]][1]

Indeed, the validity test for the object RawPop (in R/Class-Pop.R, lines 39-45) test:

    DIM1 = object@nLoci[i]%/%8L + (object@nLoci[i]%%8L > 0L)  #  <-----
    if(DIM1!=dim(object@geno[[i]])[1]){
      errors = c(errors,
                 paste0("nLoci[",i,
                        "]!=dim(geno[[",i,
                        "]][1]"))
    }

I don't understand the first line above.

Please forgive me if my question is silly. Why the test check for a dimension of "nLoci %/% 8 + (nLoci %% 8L > 0L) by ploidy by nInd" and not "nLoci by ploidy by nInd" ?

Please click here to see my session information. ``` sessionInfo() R version 4.0.4 (2021-02-15) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Pop!_OS 20.10 Matrix products: default BLAS: /usr/local/lib/R/lib/libRblas.so LAPACK: /usr/local/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] AlphaSimR_1.0.1.9000 R6_2.5.0 glmnet_4.1-1 Matrix_1.3-2 digest_0.6.27 [6] breedSimulatR_0.2.1 TSP_1.1-10 gaston_1.5.7 RcppParallel_5.0.3 Rcpp_1.0.6 loaded via a namespace (and not attached): [1] knitr_1.31 splines_4.0.4 colorout_1.2-2 prompt_1.0.1 lattice_0.20-41 rlang_0.4.10 [7] foreach_1.5.1 tools_4.0.4 grid_4.0.4 xfun_0.22 cli_2.3.1 htmltools_0.5.1.1 [13] iterators_1.0.13 survival_3.2-7 yaml_2.2.1 assertthat_0.2.1 codetools_0.2-18 shape_1.4.5 [19] glue_1.4.2 evaluate_0.14 rmarkdown_2.7 compiler_4.0.4 ```

Best regards.

gregorgorjanc commented 3 years ago

@juliendiot42 I think its because @gaynorr implemented 'compression' and you need n loci in the multiple of 8?! What is your number of loci? Maybe you add some loci at the end and have them all fixed for ancestral/reference allele?

juliendiot42 commented 3 years ago

For the example above, I have:

nLoci
##  [1] 160 163 141 202  ...

The first chromosome have 8 * 20 = 160 loci, and it doesn't work. Indeed:

160 %/% 8 = 20,
160 %% 8 = 0
So DIM1 = 20, however,

dim(geno[[1]])
## [1] 160   2 198

So DIM1 == dim(geno[[1]])[1] is FALSE

Even if I add some loci at the end it will not fix the problem.

A potential workaround would be to set the parameter nLoci to 8 times the actual number of loci in the genetic data-set:

But to do that, I need to increase the size of genMap, because it should be a list of vectors of length nLoci (see in R/Class-Pop.R, lines 145-150 ):

  for(i in 1:object@nChr){
    if(object@nLoci[i]!=length(object@genMap[[i]])){
      errors = c(errors,
                 paste0("nLoci[",i,"]!=length(genMap[[",i,"]]"))
    }
  }

However, it raises a question about how to generate the genMap object. (which elements of genMap correspond to which elements of geno ?)

Similarly, we can also remove some elements of geno to get dim(geno[[i]])[i] = nLoci/8 but in this case, we lose a lot of information which doesn't seem great.

gregorgorjanc commented 3 years ago

AFAIK GenMap has the same orientation as genotypes. so repeating the last value will give you a non recombining haplotype at the end of chromosomes

juliendiot42 commented 3 years ago

OK, thank you, I will try that