Closed trilisser closed 11 months ago
Dear @trilisser,
seqHKY_unOpt <- simSeq(tree, l = 1000, Q = c(1,2,1,1,2,1), rate = 0.1)
does simulate sequences with only one rate. Here rate is not a parameter from the gamma distribution. Setting rate here is the same as multiplying the edge length by 0.1.
When optimising the shape parameter gives you a very high value (1000 is the upper limit for the optimisation) all four rates converge towards one. So this is expected if you simulated with only one rate. See plot_gamma_plus_inv(1000)
. Here one rate would be enough.
There is a small example simulate under a (discrete) gamma distribution in the help:
# Example to simulate discrete Gamma rate variation
rates <- discrete.gamma(1,4)
data1 <- simSeq(tree, l = 250, rate=rates[1])
data2 <- simSeq(tree, l = 250, rate=rates[2])
data3 <- simSeq(tree, l = 250, rate=rates[3])
data4 <- simSeq(tree, l = 250, rate=rates[4])
data <- c(data1,data2, data3, data4)
or you can do it like this:
tree <- read.tree("tree.treefile")
seq_1000 <- simSeq(tree, l = 1000) # we just want a sequence of length 1000
fit1 <- pml(tree, seq_1000, k=4, shape = 1, Q = c(1,2,1,1,2,1), rate=.1) # define the model
seqHKY_unOpt <- simSeq(fit) # simSeq extracts the model parameters
fit <- pml(tree, seqHKY_unOpt, k=4, shape = 1, Q = c(1,2,1,1,2,1))
fit <- optim.pml(fit, model="HKY", optRate = TRUE, optEdge = F, optQ = T, optGamma = T, optBF = T)
fit$rate
fit
write.phyDat(seqHKY_opt, "seqHKY_shape1_newnamed.fasta", format = "fasta")
Dear @KlausVigo Thank you so much! Can you please comment the 5th line of your second script
fit <- pml(tree, seqHKY_unOpt, k=4, shape = 1, Q = c(1,2,1,1,2,1))
We define the model the second time, do we? If so, why do we do that?
Here I just changed to the simulated data set. The optimised parameters (line 6, based on the pml object created in line 5) should be similar to the parameters supplied to simSeq
(shape = 1, Q = c(1,2,1,1,2,1), rate=.1
) in line 3.
Thank you very much @KlausVigo ! It helps a lot!
Hi there! I try to simulate sequences using a tree with the script below.
But after optimisation
shape
became ~999 and I think it is about connection of a substitution rate and alpha parameter of gamma. How do they connect?