gaynorr / AlphaSimR

R package for breeding program simulations
https://gaynorr.github.io/AlphaSimR/
Other
40 stars 17 forks source link

potential bug with writeRecords #154

Closed poca87 closed 9 months ago

poca87 commented 10 months ago

I think there might be a bug in writeRecords function (AlphaSimR version 1.4.2), resulting in following:

Error in data.frame(id = pop@id, mother = pop@mother, father = pop@father,  : 
  no slot of name "reps" for this object of class "Pop"  

so the issue is coming from reps = pop@reps, which does not exist.

Example:

#Create founder haplotypes
founderPop = quickHaplo(nInd=10, nChr=1, segSites=20)

#Set simulation parameters
SP = SimParam$new(founderPop)
SP$addTraitA(5)
SP$setVarE(h2=0.5)
SP$addSnpChip(nSnpPerChr = 10)

#Create population
pop = newPop(founderPop, simParam=SP)

writeRecords(
  pop,
  dir = getwd(),
  snpChip = 1,
  useQtl = FALSE,
  includeHaplo = FALSE,
  append = TRUE,
  simParam = NULL
)
gregorgorjanc commented 10 months ago

Is it because you don’t have phenotypes?

What fix do you propose?

Message ID: @.***>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

poca87 commented 10 months ago

No, I have phenotypes, they are defined by SP$setVarE(h2=0.5).

I also tested with setPheno just to be sure, and it seems that the issue persists, I even tried adding option reps = 1 (which is default anyways).

So for some reason, it seems that pop@reps slot is not created even when I use setPheno (see below). Thus I think it might come from definition of Pop-class, not the actual functions.

> is(pop, "Pop")
[1] TRUE
> str(pop)
Formal class 'Pop' [package "AlphaSimR"] with 17 slots
  ..@ id     : chr [1:10] "1" "2" "3" "4" ...
  ..@ iid    : int [1:10] 1 2 3 4 5 6 7 8 9 10
  ..@ mother : chr [1:10] "0" "0" "0" "0" ...
  ..@ father : chr [1:10] "0" "0" "0" "0" ...
  ..@ sex    : chr [1:10] "H" "H" "H" "H" ...
  ..@ nTraits: int 1
  ..@ gv     : num [1:10, 1] -1.3102 1.7552 -1.0095 -0.2777 -0.0798 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr "Trait1"
  ..@ pheno  : num [1:10, 1] -0.189 2.223 -0.59 -0.328 -1.559 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr "Trait1"
  ..@ ebv    : num[1:10, 0 ] 
  ..@ gxe    :List of 1
  .. ..$ : NULL
  ..@ fixEff : int [1:10] 1 1 1 1 1 1 1 1 1 1
  ..@ misc   :List of 10
  .. ..$ : NULL
  .. ..$ : NULL
  .. ..$ : NULL
  .. ..$ : NULL
  .. ..$ : NULL
  .. ..$ : NULL
  .. ..$ : NULL
  .. ..$ : NULL
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ nInd   : int 10
  ..@ nChr   : int 1
  ..@ ploidy : int 2
  ..@ nLoci  : int 20
  ..@ geno   :List of 1
  .. ..$ : raw [1:3, 1:2, 1:10] 0f 54 ce 0b ...
gaynorr commented 10 months ago

It's a bug in writeRecords coming from the fact that the reps slot was removed and I forgot to account for this in the writeRecords function. The reps slot was removed when I removed heterogeneous residual variance from the GS functions. The heterogeneous error variance in those models was a function of reps, which performs poorly in the presence of GxE.

I guess a separate question is: should writeRecords should be supported in its current form? It's really a legacy function that dates back to a very early iteration of AlphaSimR that wrote and read all GS training data to disk. This method of running GS changed, but I left the function in place. Would it make since to replace this function with something using a more formal file structure, like writePlink?

poca87 commented 10 months ago

It is really up to you, but I think having some kind of output function that creates bunch of potentially standardised files (pedigree, genotypes, phenotypes, ...) is not a bad idea, and then each person can have a separate script to adapt those files and put them to their analysis pipeline. Of course, alternative is to remove it completely and let user to directly output needed files from R.

Regarding the "reps" slot, I figured it is gone, but help pages for some functions still include it (e.g., setPheno).

gregorgorjanc commented 10 months ago

There are soo many different file formats that one is bound to have to convert the data anyway to their favourite format:(

Plink is not a bad default though. Not really practical for repeated phenotypes and, I think, multiple traits.