Closed konopinski closed 2 years ago
Maciek,
I can duplicate your error and can see where it occurs, but I haven't been able to fully work out why it is occurring, and I'm a bit slammed with other things at the moment. Nonetheless, when I recode your block specification as below, everything works. That leads me to believe there is something about your manual creation of the fscSettingsGenetics
object that is off.
dna.blocks <- lapply(1:nloci, function(i) {
fscBlock_dna(
sequence.length = seqlength,
mut.rate = round(exp(-rnorm(1,mean = 19,sd = 1)), 11)
)
})
genetics <- do.call(fscSettingsGenetics, dna.blocks)
Cheers, Eric
Unfortunately, your solution did not work in my R. I forgot to add I'm using Ubuntu and R 4.2.1 in Rstudio. Do not worry about this as I found a working solution so this is just to let you know that there might be one more thing to improve. I would be happy to help you with it but I have to read more about defining S4 objects.
I am wondering why did you chose creating those objects in this form instead of just simple data.frame. That would make things easier sometimes ;)
Hmmm... Can you describe more clearly what didn't work? What errors do you see? There shouldn't be a difference in the code for R in Ubuntu and Mac (on which I develop). It would help to know if this is truly an error in the functions or an error in the way you constructed the object.
I use S4 explicitly because it gives me control in other functions of the form and values of these objects. It means the functions don't (shouldn't) have to check that all the pieces are there and formatted correctly because they may have been constructed by a user rather than the proper functions in the package ;)
Sorry - I wasn't clear. I ment the blocks and settings for the fsc settup. They could be plain data.frames without the special attributes. I guess this would work until the user knows exactly what does she/he wants, but you'd probably have many more conversations like this one.
Well I think I have found some cue. fsc2gtypes
works perfectly with only a single DNA block, while when there are more blocks provided it crushes with the same error (Error: the number of genes in 'sequences' is not equal to the number of loci
). I got the same error using the method suggested by you (lapply with dna_block and so on), the default definition and my 'manual' method.
The parameter file is fine as the simulation in fsc
works fine (actually, this is why I started using strataG - I found it so difficult to write parameter file without mistakes). I thought it might be p
but df2gtypes
does not touch p
. I wish I could solve it myself.
Unfortunately, it is harder for me as a developer to make them plain data.frames. There is internal checking that happens and the other functions need to know that this checking has been done and that they conform to the expected structure. You're right - people would be more tempted to make their own and I'd have a lot more debugging conversations.
Could you send me the code that generates the error you're seeing (with my 'lapply' code to define the genetic structure) and a copy of the screen output. I tested the code I sent above on my system and although I can duplicate your initial error, my code does not generate an error and works as designed. There may be a bug in package code, but this suggests that if there is, it is due to some system differences. I would really like to track it down and to do so, I want to make sure I'm running exactly what you're running.
With:
genetics <- data.frame(chromosome = rep(1,nloci),
actual.type = rep("DNA",nloci),
fsc.type = rep("DNA",nloci),
num.markers = rep(seqlength,nloci),
recomb.rate = rep(0,nloci),
mut.rate = round(exp(-rnorm(nloci,mean = 19,sd = 1)),11),
param.5 = rep(1/3,nloci),
param.6 = rep(NA,nloci))
attributes(genetics)$class <- c("fscSettingsGenetics", "fscBlock", "data.frame" )
attributes(genetics)$num.chrom <- 1
attributes(genetics)$chrom.diff <- FALSE
and
genetics <- fscSettingsGenetics(
fscBlock_dna(sequence.length = seqlength,
mut.rate = round(exp(-rnorm(1,mean = 19,sd = 1)), 11)
),
fscBlock_dna(sequence.length = seqlength,
mut.rate = round(exp(-rnorm(1,mean = 19,sd = 1)), 11)
),
fscBlock_dna(sequence.length = seqlength,
mut.rate = round(exp(-rnorm(1,mean = 19,sd = 1)), 11)
)
)
it is exactly the same. It is basically the code from the original post. But when I change nloci
to 1 it works with all the methods.
and the expected outcome just for one dna block:
<<< strataG.fsc >>>
Contents: 300 samples, 1 locus, 3 strata
Strata summary:
stratum num.ind num.missing num.haplotypes
1 Deme.1 100 0 1
2 Deme.2 100 0 1
3 Deme.3 100 0 2
Sequence summary:
locus num.seqs mean.length mean.num.ns mean.num.indels
1 C1B1_DNA 2 150 0 0
I used this code in the publication yet to be accepted and it gave me whatever I needed. This time I wanted to simulate shotgun sequencing (such as RAD on I;lumina) thus I need to make mutation rates variable and get many small unlinked fragments (actually I should put the fragments on different chromosomes to make them unlinked).
Thanks for this. I'll dig into the code to see if there is some reason the multi-block specification works as desired on my system, but not on yours and will let you know what I find...
But please, do not spend the weekend on it :) It's 1:30AM in Poland so it's time for me too. Thanks a lot for everything you do here. Cheers, Maciek
Hi, stupid thing... it works. The only thing I've changed in the genetics
definition was that I placed the sequences in separate chromosomes to unlink them completely. Like that:
dna.blocks <- lapply(1:nloci, function(i) {
fscBlock_dna(
chromosome = i,
sequence.length = seqlength,
mut.rate = round(exp(-rnorm(1,mean = 18,sd = 1)), 11)
)
})
genetics <- do.call(fscSettingsGenetics, dna.blocks)
Maybe that will shed some light on the issue?
Would you send me the .rdata file produced by the following code on your machine?
rm(list = ls())
library(strataG)
demes <- fscSettingsDemes(
fscDeme(1000, 100),
fscDeme(10000, 100),
fscDeme(100000, 100),
ploidy = 1
)
events <- fscSettingsEvents(
fscEvent(10000, 0, 2, 1, 1, 0, 0),
fscEvent(10000, 1, 2, 1, 1, 0, 0)
)
dna.blocks <- replicate(
5,
fscBlock_dna(
sequence.length = 150,
mut.rate = round(exp(-rnorm(1, mean = 19, sd = 1)), 11)
),
simplify = FALSE
)
genetics <- do.call(fscSettingsGenetics, dna.blocks)
p <- fscWrite(
demes, genetics, events = events,
label = "konopinski.test",
use.wd = TRUE
)
p <- fscRun(p, num.sims = 1000, num.cores = 6, seed = 545125)
df <- fscReadArp(p)
save.image("konopinski test.rdata")
konopinski test.zip Hope this helps. It seems that is is not only my problem - in June there was a similar issue with the same error (reported by akoontz11). I do not know how that one ended because there is no answer to your comment (https://githubhelp.com/EricArcher/strataG/issues/55).
Although that was the same error message, that users' issue was different. The code you're using should have the fix for that that I pushed.
Although I still don't see why it worked on my system and not yours previously, I think I've located the source of this issue.
It has to do with the fact that you are simulating multiple blocks of haploid sequences. In developing the code, I'd not tested the fsc2gtypes()
conversion for this scenario. I'd only dealt with a single haploid block. gtypes
objects can handle multiple loci (using the multidna object), or I can concatenate all sequences. I'll make that an option in fsc2gtypes()
.
I will make the changes and let you know when I've pushed up a new version.
I have pushed a new version that fixes this issue in fsc2gtypes()
. I've also added a new concat.dna
argument to that function. If TRUE
(default), all haploid DNA blocks are concatenated into a single locus. If FALSE
, each block is loaded as an independent locus. The behavior is demonstrated below:
rm(list = ls())
library(strataG)
demes <- fscSettingsDemes(
fscDeme(1000, 100),
fscDeme(10000, 100),
fscDeme(100000, 100),
ploidy = 1
)
events <- fscSettingsEvents(
fscEvent(10000, 0, 2, 1, 1, 0, 0),
fscEvent(10000, 1, 2, 1, 1, 0, 0)
)
dna.blocks <- replicate(
5,
fscBlock_dna(
sequence.length = 150,
mut.rate = round(exp(-rnorm(1, mean = 19, sd = 1)), 11)
),
simplify = FALSE
)
genetics <- do.call(fscSettingsGenetics, dna.blocks)
p <- fscWrite(
demes, genetics, events = events,
label = "konopinski.test",
use.wd = TRUE
)
p <- fscRun(p, num.sims = 1000, num.cores = 6, seed = 545125)
g.concat <- fsc2gtypes(p, concat.dna = TRUE)
g.multi <- fsc2gtypes(p, concat.dna = FALSE)
g.concat
g.multi
Reinstall the package and let me know if it works on your system. Feel free to close the issue if so.
The case is closed. Although, very slow for a large number of loci it works. Thanks a lot
Dear Eric, I have simulated some sequences with the following code:
I cannot convert the results to
gtypes
format. I constantly get an error:I have traced it down to the final definition of the
gtypes
object indf2gtypes
function (lines 91-101 - the functionnew
) but cannot go deeper as I'm not too familiar with S4 object.Anyway, I have to emphasize again that I LOVE THE PACKAGE ;) I managed to analyse the data using other methods but I think you might find this issue interesting. Or did I do something wrong? I needed this strange definition of
genetics
because I wanted to simulate different mutation rates at different loci. Maybe the problem is in this part? I'm using the latest version from Github (2.5.01).Cheers, Maciek