EricArcher / strataG

strataG is a toolkit for haploid sequence and multilocus genetic data summaries, and analyses of population structure.
25 stars 12 forks source link

Fastsimcoal automation #33

Closed konopinski closed 4 years ago

konopinski commented 4 years ago

Hi, I guess this is a very hard time for you after releasing such a revolutionary version of the soft. I really appreciate your work. But I had to downgrade back to 2.0.2 because of the new method of introducing data to fastsimcoal. I managed to rewrite the script to produce input files automatically (24 microsatellite loci, 6 different allele nos. limits, 4 different mutation rates, and 4 populations with different histories) make some automated simulations, but did not manage to make fastsimcoal running. I know there's some problem possibly with the system, because I was not able to run it manually, but the par file from 2.0.2 works excellent. Below you will find a piece of script I used for simulations so that you can understand what automation I'm talking about. Maybe that will inspire you ;) I managed to rewrite most of the blocks, except for the new format event block. The main obstacle is that you cannot provide multiple data as I did before. In the end I found I can build normal data.frames and process them by the new functions so they get the desired classes and attributes, but building them is much more tricky. And after all that work I was very disappointed to see that fsc hangs. I will try to attach param files from the old and new versions, but I'm waiting for my simulations to finish, and I have to recreate all the loops I wrote for building new param file. Cheers Maciek Konopiński

popSize <- 10000
# mutation rates in loci
mutRates  <- c(0.0001, 0.0002, 0.0005, 0.001)
# max no. of alleles in loci
maxRange <- c(3, 6, 9, 12, 15, 20)
numLoci <- length(mutRates) * length(maxRange)
# bottleneck sizes
botSize <- c(500, 50, 20)
#historical events
botTime <- 20 # since t0
botLength <- 20 # since the beginning of the bottleneck
splitTime <- 50 # since t0
numPops <- length(botSize) + 1

splSize <- rep(popSize, numPops)

pop.info <- strataG::fscPopInfo(
  pop.size = rep(popSize, numPops),
  sample.size = splSize,
  sample.times = c(rep(0, numPops))
)

# Setting the parameters for fastsimcoal
# Parameters of microsatellite loci
locus.params <-
  strataG::fscLocusParams(
    locus.type = "msat",
    num.loci = 1,
    mut.rate = rep(mutRates, each = length(maxRange)),
    #proportion of non-stepwise mutations
    gsm.param = 0.2,
    #maximum number of alleles at locus
    range.constraint = rep(maxRange, length(mutRates)),
    #diploid individuals
    ploidy = 2,
    #24 chromosomes (free recombination)
    chromosome = c(1:(numLoci))
  )
# Parameters of coalescent analysis
# dates (in generations) of historical events
hist.ev <- strataG::fscHistEv(
  num.gen = c(
    rep(botTime, numPops - 1),
    rep(botTime + botLength, numPops - 1),
    rep(splitTime, numPops - 1)
  ),
  source.deme = as.numeric(c(seq(1:(numPops - 1)),
                             seq(1:(numPops - 1)),
                             seq(1:(numPops - 1)))),
  sink.deme = c(seq(1:(numPops - 1)),
                seq(1:(numPops - 1)),
                rep(0, numPops - 1)),
  prop.migrants = c(rep(1, 3 * (numPops - 1))),
  #simulating bottlenecks and recovery to previous popsize
  new.sink.size = c(as.numeric(rep(botSize / popSize)),
                    as.numeric(rep(popSize / botSize)),
                    rep(1, numPops - 1))
)
simulated <- strataG::fastsimcoal(
    pop.info,
    locus.params,
    mig.rates = NULL,
    hist.ev,
    num.cores = 12,
    exec = paste0("fsc26"),
    delete.files = TRUE,
    quiet = TRUE
  )
EricArcher commented 4 years ago

If I've understood the simulation you wanted to specify correctly, here is a script that uses the new fastsimcoal function format that does what you want:

rm(list = ls())
library(strataG)

popSize <- 10000

# mutation rates in loci
mutRates  <- c(0.0001, 0.0002, 0.0005, 0.001)

# max no. of alleles in loci
maxRange <- c(3, 6, 9, 12, 15, 20)

# bottleneck sizes
botSize <- c(500, 50, 20)

#historical events
botTime <- 20 # since t0
botLength <- 20 # since the beginning of the bottleneck
splitTime <- 50 # since t0

numPops <- length(botSize) 

# Setting the parameters for fastsimcoal

# set demes
demes <- do.call(
  fscSettingsDemes,
  lapply(1:numPops, function(i) fscDeme(popSize, popSize))
)

# Parameters of microsatellite loci
msat.params <- expand.grid(mu = mutRates, range = maxRange)
msats <- lapply(1:nrow(msat.params), function(i) {
  fscBlock_microsat(
    num.loci = 1,
    mut.rate = msat.params$mu[i],
    range.constraint = msat.params$range[i],
    chromosome = i
  )
})
genetics <- do.call(fscSettingsGenetics, msats)

# Parameters of coalescent analysis

source.sink <- combn(numPops, 2) - 1

after.bot <- lapply(1:ncol(source.sink), function(i) {
  source.pop <- source.sink[2, i] + 1
  fscEvent(
    event.time = botTime,
    source = source.sink[1, i],
    sink = source.sink[2, i],
    new.size = botSize[source.pop] / popSize
  )
})

during.bot <- lapply(1:ncol(source.sink), function(i) {
  source.pop <- source.sink[2, i] + 1
  fscEvent(
    event.time = botTime + botLength,
    source = source.sink[1, i],
    sink = source.sink[2, i],
    new.size = popSize / botSize[source.pop]
  )
})

before.bot <- lapply(1:ncol(source.sink), function(i) {
  fscEvent(
    event.time = botTime + botLength,
    source = source.sink[1, i],
    sink = source.sink[2, i]
  )
})

events <- do.call(fscSettingsEvents, c(after.bot, during.bot, before.bot))

p <- fscWrite(
  demes = demes,
  genetics = genetics,
  events = events
)
p <- fscRun(p, num.cores = 5)
konopinski commented 4 years ago

Thanks for your prompt answer. I will certainly try the code, but I won't until I'll finish current work. I've added the working par file. I wonder what is the difference between this one and the one I produced with the new strataG. fsc.runningFile.par.txt

EricArcher commented 4 years ago

This par file looks like it was produced by skeleSim. That package is based on the old version of strataG. I'm still trying to find some chunk of time to update that package to the new version.