dwbapst / paleotree

R library for analyzing, time-scaling and simulating phylogenies of extinct/fossil lineages. Also plots diversity curves for stratigraphic range data and phylogenies, including combinations of these two data types.
Creative Commons Zero v1.0 Universal
21 stars 9 forks source link

createMrBayesTipDatingNexus - clarification requested on defining ingroup/outgroup exclusivity #35

Open dwbapst opened 2 years ago

dwbapst commented 2 years ago

From an email sent by Armin Elser, sent in December 2020:

I made an observation on the createMrBayesTipDatingNexus() function of paleotree used to create the needed files to run the FBD tip-dating with MrBayes. After discussion with some colleagues it appears that people are not aware of this "observation/issue". I used a workaround to get around the problem but I wanted to confirm with you, whether this is the only way to solve the problem. If it is, it might be worthwhile adding a short note to the createMrBayesTipDatingNexus() help section.

For createMrBayesTipDatingNexus() we can specify an input tree to provide treeConstraints. Unless specified otherwise by the user these constraints will be hard constraints as defined by the MrBayes manual (e.g., Ronquist et al., 2020, p. 145). If the root of this input tree is fully resolved (i. e. without any polytomies; see "Input_tree.pdf") then the constraint for one of the two descendant clades from the root will be ignored by MrBayes and the following warning message will be printed in the MrBayes ".out." file ("Test_FBD_dating.nex.out"):

"WARNING: Constraint 'node3' is a duplicate of another constraint and will be ignored"

(the exact node number depends on the input tree).

Since this constraint is ignored, taxa whose clade membership should be forced by the user constraints will obviously not abide to the user constraint but can be found in clades to which they should not belong to.

A simple workaround I used to circumvent the problem was to add two outgroups in a polytomy to the root of the tree (see "Input_tree_with_outgroups.pdf"). In that case, no constraint will be ignored ("Out.Test_FBD_dating.nex.out"). Is this a necessary step? Or is there another way to get around the problem? In the help section of the createMrBayesTipDatingNexus() function the following is stated: "If the outgroup-ingroup split is not present on the supplied treeConstraints, add that split to treeConstraints manually."

And the NEXUS file, that is produced by paleotree also contains the following comment: "prset topologypr = constraints(ingroup,node1,node2); [need to include ingroup]]"

Are these comments meant to address the above-mentioned problem? If so, it might be worthwhile adding a short note in the help section of createMrBayesTipDatingNexus() of what exactly needs to be done (since the MrBayes manual is also not really helpful in these regards). I have spoken to a couple of other colleagues and they were not even aware of the issue.

I have attached an R script that allows to replicate the problem ("MrBayes_Constraint_minExample.R"). It generates a NEXUS file that will produce the "ignore constraint" warning in MrBayes ("Test_FBD_dating.nex") and a second NEXUS file ("Out.Test_FBD_dating.nex") which does not generate the warning, since two outgroups are added in a polytomy.

dwbapst commented 2 years ago

This sounds like it is just an issue with clarifying the help page for the respective functions -- but not even be worth it as MrBayes increasingly is outmoded by other software approaches. Still, I regret letting this issue go without dealing with it during the Pandemic.

dwbapst commented 2 years ago

I don't think I'd encountered this particular issue, but from what I recall, constraining the outgroup for MrBayes is very difficult to do in a programmatic approach. Seems likely the example data sets I used to test my functions on just didn't have an outgroup that was composed of enough taxa, or were similar enough to the in-group to have this ingroup/outgroup mixing happen.

dwbapst commented 2 months ago

The files Armin sent; here is the input tree and the input tree with outgroups.

Input_tree.pdf Input_tree_with_outgroups.pdf

dwbapst commented 2 months ago

Here's the code supplied

library(phytools)
library(paleotree)

#Simulate some fossil ranges with simFossilRecord
set.seed(444)
record <- simFossilRecord(p=0.1, q=0.1, nruns=1,
                          nTotalTaxa=c(20,30), nExtant=0)
taxa <- fossilRecord2fossilTaxa(record)
taxa.ages <- taxa[,c("orig.time","ext.time")]
colnames(taxa.ages) <- c("FAD", "LAD")
# #simulate a fossil record with imperfect sampling with sampleRanges
# rangesCont <- sampleRanges(taxa,r=0.5)
#let's use taxa2cladogram to get the 'ideal' cladogram of the taxa
cladogram <- taxa2cladogram(taxa,plot=TRUE)

plot(cladogram)
# #Fully resolve relationships - we need to do this, to trigger the warning in MrBayes
# #What is important is that the relationships at the root are fully resolved
cladogram.res <- multi2di(cladogram)
plot(cladogram.res)

# #Create MrBayes command file
mrbayes.cmd <- createMrBayesTipDatingNexus(
  tipTimes = taxa.ages, # #Alternatively we could use "mytimeList.myclade"
  outgroupTaxa = NULL,
  treeConstraints = cladogram.res,
  ageCalibrationType = "uniformRange",
  whichAppearance = "first",
  anchorTaxon = "t7", # #Allows to position the timescaled trees in absolute time. The age of this taxon is fixed.
  treeAgeOffset = 5,    # #example uses 10
  origNexusFile = NULL,
  createEmptyMorphMat = TRUE,
  runName = paste("Test_FBD_dating.nex", sep=""),
  ngen = "2000000", # #Default: 100000000
  doNotRun = FALSE,
  autoCloseMrB = TRUE
)

# #Set uniform prior on root
mrbayes.cmd <- gsub("prset treeagepr = offsetexp","prset treeagepr = uniform", mrbayes.cmd)
# #Change MrBayes MCMC settings
mrbayes.cmd <- gsub("printfreq = 1000 samplefreq = 1000 nchains = 4 nruns = 2",
                    paste("printfreq = 40",
                          " samplefreq = 40",
                          " nchains = 4",
                          " nruns = 2", sep=""), mrbayes.cmd)
# #Note, that if you want to continue a MrBayes run from a checkpoint, you need to use the command
# #"mcmc append=yes;", i. e. you just add "append=yes" to your command file 

# #Write the MrBayes command file
write(mrbayes.cmd, file=paste("Test_FBD_dating.nex", sep=""))

# #If we use MrBayes approach for trees which are fully resolved at the root, we have to add outgroup taxas
# #otherwise MrBayes will ignore some of the topological constraints we set for the clade

# #Unfortunately, bind.tip cannot be used in this case, since we want to add the outgroup to the root
# #So let's use bind.tree instead and create a 1 taxon-tree (see http://blog.phytools.org/2012/11/adding-single-tip-to-tree.html)
outgroup.tip <- list(edge = matrix(c(2,1),1,2),
                     tip.label = "t1_outgroup",
                     #edge.length = 1.0,
                     Nnode = 1)
class(outgroup.tip) <- "phylo"
# #Attach the outgroup to the root (or rather below the current "root" node)
cladogram.res.out <- bind.tree(cladogram.res, outgroup.tip, where = "root",position = 1)
cladogram.res.out$root.edge <- NULL # #Let's remove the root edge element (otherwise we cannot add the second outgroup taxon with bind.tip)
plot(cladogram.res.out) # #Check whether the taxon has been added properly
# #Let's add another taxon in a polytomy at the root as outgroup (otherwise it still does not seem to work)
cladogram.res.out <- bind.tip(cladogram.res.out, "t2_outgroup", 
                         where = length(cladogram.res.out$tip.label)+1, position = 0)
plot(cladogram.res.out) # #Check whether the taxon has been added properly

# #We also have to add the outgroup taxa to myclade.ages
outgroup.tip.ages <- data.frame(FAD = c(1003, 1007), LAD = c(1000, 1005), row.names = c("t1_outgroup", "t2_outgroup"))
taxa.ages.out <- rbind(taxa.ages, outgroup.tip.ages)

# #Note, that the treeAgeOffset might have to be changed as well, since you are now time-scaling a different tree which
# #includes the two outgroups

# #Create MrBayes command file
mrbayes.out.cmd <- createMrBayesTipDatingNexus(
  tipTimes = taxa.ages.out, # #Alternatively we could use "mytimeList.myclade"
  outgroupTaxa = NULL,
  treeConstraints = cladogram.res.out,
  ageCalibrationType = "uniformRange",
  whichAppearance = "first",
  anchorTaxon = "t7", # #Allows to position the timescaled trees in absolute time. The age of this taxon is fixed.
  treeAgeOffset = 5,    # #example uses 10
  origNexusFile = NULL,
  createEmptyMorphMat = TRUE,
  runName = paste("Out.Test_FBD_dating.nex", sep=""),
  ngen = "2000000", # #Default: 100000000
  doNotRun = FALSE,
  autoCloseMrB = TRUE
)

# #Set uniform prior on root
mrbayes.out.cmd <- gsub("prset treeagepr = offsetexp","prset treeagepr = uniform", mrbayes.out.cmd)
# #Change MrBayes MCMC settings
mrbayes.out.cmd <- gsub("printfreq = 1000 samplefreq = 1000 nchains = 4 nruns = 2",
                    paste("printfreq = 40",
                          " samplefreq = 40",
                          " nchains = 4",
                          " nruns = 2", sep=""), mrbayes.out.cmd)
# #Note, that if you want to continue a MrBayes run from a checkpoint, you need to use the command
# #"mcmc append=yes;", i. e. you just add "append=yes" to your command file 

# #Write the MrBayes command file
write(mrbayes.out.cmd, file=paste("Out.Test_FBD_dating.nex", sep=""))

# #If we want to, we can also import the FBD trees generated by MrBayes
# #Import FBD trees generated by MrBayes
mrbayes_trees <- obtainDatedPosteriorTreesMrB(
  runFile = paste("Test_FBD_dating.nex.run1.t", sep=""),
  nRuns = 2, # #Default: 2
  burnin = 0.2,
  outputTrees = 100, # #Sample a number of trees from the posterior distribution; Default: 100
  labelPostProb = FALSE,
  getFixedTimes = TRUE,
  getRootAges = TRUE,
  #originalNexusFile = NULL,
  file = NULL
)

# #Import FBD trees generated by MrBayes
out.mrbayes_trees <- obtainDatedPosteriorTreesMrB(
  runFile = paste("Out.Test_FBD_dating.nex.run1.t", sep=""),
  nRuns = 2, # #Default: 2
  burnin = 0.2,
  outputTrees = 100, # #Sample a number of trees from the posterior distribution; Default: 100
  labelPostProb = FALSE,
  getFixedTimes = TRUE,
  getRootAges = TRUE,
  #originalNexusFile = NULL,
  file = NULL
)