Simulating trees/sequences for testing

RouxCil commented 2 years ago

These simulations will be used to determine the integration time of sequences recovered from latent viral reservoir.

[x] Using twt simulate within-host dynamics that:

Sample sequences from the active compartment at intervals prior to sampling from the LR Should a we simulate ART initiation (i.e no more active replication) after some time?
[x] Generate sequences from the simulated dynamics
[x] Reconstruct trees from sequences
[x] Use bayroot and root-to-tip regression to recover integration dates
[x] Are there differences in the integration date estimates?

ArtPoon commented 2 years ago

Current script and YAML yield population trajectories where number of actively infected cells is sustained at high levels for too long under ART (14 -> 0):

set.seed(1234)
#path <- '~/1_projects/bayroot/Simulated_integration.yaml'
path <- '~/git/bayroot/Simulated_integration.yaml'
model <- Model$new(yaml.load_file(path))

# run an outer tree simulation
outer <- sim.outer.tree(model)
# display the population trajectories
plot(outer, type='s')

ArtPoon commented 2 years ago

increased size of actively infected cell population from 100 to 1000 - if this size is too small, then increasing the death rate makes it difficult for the simulation model to sample 4 lineages from this compartment at time 10
increased death rate of actively infected cells from 0.004 to 0.2

ArtPoon commented 2 years ago

Adapted @ewong347's script with Replenish compartment to try to stabilize population dynamics Confirmed that uninfected (susceptible) cells from Replenish remain susceptible on transition to Active class:

> head(outer$get.counts())
      time S.Active S.Latent S.Replenish S.Death I.Active I.Latent I.Replenish I.Death
1 19.99826      100       10          99       0        1        0           0       0
2 19.97343      101       10          98       0        1        0           0       0
3 19.96447      102       10          97       0        1        0           0       0

ArtPoon commented 2 years ago

I think another problem is that transition rates for a given CompartmentType are the same whether a member of that group is infected or not. This doesn't make sense for infected cells. I think this is an unintended consequence of focusing our development of twt on standard compartmental models like SIR, where individuals always pass through an infected state before removal.

ArtPoon commented 2 years ago

I'm going to create a new branch in twt called cells to implement different transition rates for susceptible and infected members of a compartment. I think this can be implemented here: https://github.com/PoonLab/twt/blob/6b3f4e38a8cc80c2df5aea5fc241e23545f33360/R/simOuterTree.R#L421-L422

where we currently assume the same transition rates to all members whether they are infected or not. (Note this is where we are calculating the total rates of each type of outer event, i.e., transmission, migration, transition.)

So we need to do the following:

[x] extend current YAML so user can (optionally) specify state (susceptible/infected)-dependent transition rates
[x] use these different rates to set the total population-level transition rate
[x] when assigning events, determine whether a given transition affects a susceptible or infected member of the compartment

ArtPoon commented 2 years ago

We already letting the user specify epochs by adding another level to rate specifications in the YAML:

    branching.rates: 
      20: (Active=0.01, Latent=0.0, Replenish= 0.0, Death= 0.0)
      10: (Active=0.0, Latent=0.0, Replenish=0.0, Death=0.0)

I don't want to add complexity by allowing for another level, so instead let's try to give the user the option of replacing Active=0.01 with Active=(S=0.01, I=0.02) and detecting which syntax is being used when parsing the YAML.

ArtPoon commented 2 years ago

Actually we only support epochs for branching (transmission) rates right now, so let's just re-use the existing code for transitions.

ArtPoon commented 2 years ago

diff --git a/R/Model.R b/R/Model.R
index 733b984..0e84d24 100644
--- a/R/Model.R
+++ b/R/Model.R
@@ -173,6 +173,7 @@ Model <- R6Class("Model",
                "declared in CoalescentType", x)
         }

+        # handle epochs (rate changes at specified points in time)
         rate.changes <- lapply(params$branching.rates, function(x) {
           eval(parse(text=paste('list', x))) 
           })
@@ -180,10 +181,14 @@ Model <- R6Class("Model",
           rate.changes <- rate.changes[order(as.numeric(names(rate.changes)), decreasing=T)]
         }

+        rate.changes2 <- lapply(params$transition.rates, function(x) {
+          eval(parse(text=paste('list', x))) 
+        })
+        
         CompartmentType$new(
           name = x,
           branching.rates = rate.changes,
-          transition.rates = eval(parse(text=paste('list', params$transition.rates))),
+          transition.rates = rate.changes2,
           migration.rates = eval(parse(text=paste('list', params$migration.rates))),

           bottleneck.size = params$bottleneck.size,

> model <- Model$new(yaml.load_file(path))
> model$get.types()$Active$get.transition.rates()
$susceptible
$susceptible$Active
[1] 0

$susceptible$Latent
[1] 0.001

$susceptible$Replenish
[1] 0

$susceptible$Death
[1] 0.01

$infected
$infected$Active
[1] 0

$infected$Latent
[1] 0.002

$infected$Replenish
[1] 0

$infected$Death
[1] 0.02

Pushed to branch cells

ArtPoon commented 2 years ago

10 replicate simulations with latent.yaml in commit 4e69f0c99e6edb4ccec2945d5d46f511690f68a9

ArtPoon commented 2 years ago

Tracking the rest of simulating cellular trees in https://github.com/PoonLab/twt/issues/137

ArtPoon commented 2 years ago

As of commit b67957e2 in branch cells, twt will now generate trees with branches annotated with compartment types so that we can collapse branches that correspond to latently-infected cells:

require(twt)
path <- '~/git/bayroot/latent.yaml'
model <- Model$new(yaml.load_file(path))
outer <- sim.outer.tree(model)
phy <- as.phylo(outer)
phy$edge.length[phy$from.type=='Latent'] <- 0
plot(tree.layout(phy))

ArtPoon commented 2 years ago

I just expanded the number of Active compartments to sample at different points in time, prior to ART initiation at time 10. I noticed that some of the Latent samples are not being recognized as such:

path <- '~/git/bayroot/latent.yaml'
model <- Model$new(yaml.load_file(path))
outer <- sim.outer.tree(model)
phy <- as.phylo(outer)
L <- tree.layout(phy)
pch <- list('transmission'=17, 'transition'=1, 'tip'=19)
plot(L, type='n') #, label='b', srt=30, cex=0.5)
lines(L, col=ifelse(L$edges$from.type=='Active', 'red', 'blue'))
points(L$edges$x1, L$edges$y1, cex=1,
       pch=as.integer(pch[L$edges$event.type]))

It looks like this is happening because of how I'm retrieving CompartmentType info for sampled Compartments in Run.R:

sampled.types <- sapply(run$get.compartments(), function(x) x$get.type()$get.name())

This seems to be returning the current CompartmentType after the respective lineages have gone through the transition events as we follow resolved the outer tree down to the root.

ArtPoon commented 2 years ago

Fixed with following:

diff --git a/R/Run.R b/R/Run.R
index 90ec68b..28b1c07 100644
--- a/R/Run.R
+++ b/R/Run.R
@@ -329,7 +329,12 @@ as.phylo.Run <- function(run) {
   # append tips
   fixed.sampl <- eventlog$get.fixed.samplings()
   names(fixed.sampl) <- gsub("__.+$", "", names(fixed.sampl))
-  sampled.types <- sapply(run$get.compartments(), function(x) x$get.type()$get.name())
+  
+  # This is returning current (earliest) state, see bayroot#14
+  #sampled.types <- sapply(run$get.compartments(), function(x) x$get.type()$get.name())
+  idx <- sapply(names(fixed.sampl), function(x) min(which(events$compartment1==x)))
+  sampled.types <- events$type1[idx]
+  
   tips <- data.frame(
     event.type='tip',
     time=as.numeric(fixed.sampl),

Note open circles represent transition events

ArtPoon commented 2 years ago

Something funny is still going on:

There are two latent sampled lineages at the bottom that descend from the same Active compartment, but this shouldn't be possible.

> L$edges[8,]
  parent child    length isTip to.type from.type   event.type       x0
8     48    49 0.8152995 FALSE  Active    Active transmission 4.210649
        x1  y0  y1
8 5.025948 1.5 1.5
> L$edges[9,]
  parent child    length isTip to.type from.type   event.type       x0
9     49    50 0.3693563 FALSE  Latent    Active transmission 5.025948
        x1  y0  y1
9 5.395305 1.5 1.5
> L$edges[11,]
   parent child   length isTip to.type from.type   event.type       x0
11     50    51 11.12391 FALSE  Latent    Latent transmission 5.395305
         x1 y0 y1
11 16.51922  2  2

ArtPoon commented 2 years ago

Looks like the conversion in as.phylo.R still isn't quite right:

> run <- outer
> eventlog <- run$get.eventlog()
> events <- eventlog$get.all.events()
> events
      event.type      time lineage1 lineage2   compartment1   compartment2  type1  type2
1   transmission  3.035092       NA       NA   Latentcomp_9    US_Latent_1 Latent Latent
2   transmission  3.438683       NA       NA   Latentcomp_8   Latentcomp_2 Latent Latent
3   transmission  3.883692       NA       NA   Latentcomp_7    US_Latent_2 Latent Latent
4   transmission  3.891747       NA       NA   Latentcomp_4    US_Latent_3 Latent Latent
5   transmission  4.877821       NA       NA    US_Latent_3    US_Latent_4 Latent Latent
6     transition  4.973071       NA       NA    US_Latent_1    US_Latent_1 Latent Active
7   transmission  5.215954       NA       NA   Latentcomp_3    US_Latent_5 Latent Latent
8   transmission  7.309417       NA       NA    US_Latent_2    US_Latent_6 Latent Latent
9   transmission 10.816673       NA       NA   Latentcomp_1    US_Latent_7 Latent Latent
10  transmission 11.293723       NA       NA  Latentcomp_10    US_Active_8 Latent Active
11  transmission 11.807718       NA       NA  Activecomp3_5    US_Active_9 Active Active
12  transmission 12.698918       NA       NA   Latentcomp_6    US_Latent_7 Latent Latent
13  transmission 13.870533       NA       NA    US_Latent_1   US_Active_10 Active Active
14  transmission 14.562596       NA       NA   Latentcomp_2   US_Active_11 Latent Active
15  transmission 14.840485       NA       NA    US_Latent_4   US_Latent_12 Latent Latent
16  transmission 14.931953       NA       NA   US_Active_11   US_Active_13 Active Active
17  transmission 14.998390       NA       NA  Activecomp3_6   US_Active_14 Active Active
18  transmission 15.048617       NA       NA    US_Latent_5   US_Active_15 Latent Active
19  transmission 15.091045       NA       NA   US_Latent_12   US_Active_16 Latent Active
20  transmission 15.407585       NA       NA    US_Latent_7   US_Active_17 Latent Active
21  transmission 15.581387       NA       NA  Activecomp3_4   US_Active_18 Active Active
22  transmission 15.642237       NA       NA  Activecomp3_8   US_Active_19 Active Active
23  transmission 15.747252       NA       NA   US_Active_13   US_Active_20 Active Active
24  transmission 15.756097       NA       NA   Latentcomp_5   US_Active_21 Latent Active
25  transmission 15.771058       NA       NA  Activecomp3_3   US_Active_22 Active Active
26  transmission 15.828416       NA       NA    US_Active_8   US_Active_23 Active Active
27  transmission 15.956173       NA       NA Activecomp3_10   US_Active_24 Active Active

ArtPoon commented 2 years ago

I think the problem is in assigning parents (further into as.phylo.Run):

> events[9:12,]
      event.type      time lineage1 lineage2 compartment1 compartment2  type1  type2
149 transmission 14.562596       NA       NA Latentcomp_2 US_Active_11 Latent Active
32           tip  0.000000       NA       NA         <NA>         <NA> Latent Latent
210 transmission  3.438683       NA       NA Latentcomp_8 Latentcomp_2 Latent Latent
38           tip  0.000000       NA       NA         <NA>         <NA> Latent Latent
                     node.name                     parent
149 US_Active_11__Latentcomp_2 US_Active_13__US_Active_11
32                Latentcomp_2 US_Active_11__Latentcomp_2
210 Latentcomp_2__Latentcomp_8 US_Active_11__Latentcomp_2
38                Latentcomp_8 Latentcomp_2__Latentcomp_8

Latentcomp_2 should have Latentcomp_2__Latentcomp_8 as a parent, not US_Active_11__Latentcomp_2

ArtPoon commented 2 years ago

diff --git a/R/Run.R b/R/Run.R
index 28b1c07..3ae94e3 100644
--- a/R/Run.R
+++ b/R/Run.R
@@ -340,14 +340,14 @@ as.phylo.Run <- function(run) {
     time=as.numeric(fixed.sampl),
     lineage1=NA,
     lineage2=NA,
-    compartment1=NA,
+    compartment1=names(fixed.sampl),
     compartment2=NA,
     type1=sampled.types,
     type2=sampled.types,
     node.name=names(fixed.sampl),
     parent=events$node.name[
       sapply(names(fixed.sampl), function(comp) {
-        min(which(events$compartment1==comp))
+        min(which(events$compartment1==comp | events$compartment2==comp))
         })]
   )
   events <- rbind(tips[order(tips$time), ], events)
@@ -364,8 +364,10 @@ as.phylo.Run <- function(run) {
   }

   # create phylo object
-  tip.label <- events$node.name[is.na(events$compartment1)]
-  node.label <- c('ROOT', events$node.name[!is.na(events$compartment1)])
+  #tip.label <- events$node.name[is.na(events$compartment1)]
+  tip.label <- events$node.name[events$event.type=='tip']
+  #node.label <- c('ROOT', events$node.name[!is.na(events$compartment1)])
+  node.label <- c('ROOT', unique(events$node.name[events$event.type!='tip']))
   nodes <- c(tip.label, node.label)
   edges <- matrix(NA, nrow=length(nodes)-1, ncol=2)
   edges[,1] <- match(events$parent, nodes)

This seems to fix the problem:

ArtPoon commented 2 years ago

Recursion error when simulating trees:

> i <- 10  # set.seed(1), 10th iteration
>   outer <- sim.outer.tree(model)
>   phy <- as.phylo(outer)
Error: C stack usage  7970128 is too close to the limit
In addition: Warning messages:
1: In min(which(events$compartment1 == x)) :
  no non-missing arguments to min; returning Inf
2: In min(which(events$compartment1 == comp)) :
  no non-missing arguments to min; returning Inf

ArtPoon commented 2 years ago

Stepping through the as.phylo.Run code to find the error:

>   idx <- sapply(names(fixed.sampl), function(x) min(which(events$compartment1==x)))
Warning message:
In min(which(events$compartment1 == x)) :
  no non-missing arguments to min; returning Inf
> sapply(names(fixed.sampl), function(x) which(events$compartment1==x))
$Activecomp1_1
[1] 146

$Activecomp1_2
[1] 148

$Activecomp1_3
[1] 145

$Activecomp1_4
[1] 141

$Activecomp1_5
integer(0)

$Activecomp1_6
[1] 155

ArtPoon commented 2 years ago

> events[events$compartment1=='Activecomp1_5',]
 [1] event.type   time         lineage1     lineage2     compartment1 compartment2 type1       
 [8] type2        node.name    parent      
<0 rows> (or 0-length row.names)
> events[events$compartment2=='Activecomp1_5',]
      event.type     time lineage1 lineage2  compartment1  compartment2  type1  type2
70  transmission 15.55935       NA       NA  US_Active_59 Activecomp1_5 Active Active
158 transmission 18.90771       NA       NA Activecomp1_7 Activecomp1_5 Active Active
159 transmission 18.93844       NA       NA US_Active_116 Activecomp1_5 Active Active
                       node.name                       parent
70   Activecomp1_5__US_Active_59 Activecomp1_5__Activecomp1_7
158 Activecomp1_5__Activecomp1_7 Activecomp1_5__US_Active_116
159 Activecomp1_5__US_Active_116                         ROOT

ArtPoon commented 2 years ago

This seems to be resolved with:

@@ -332,7 +332,9 @@ as.phylo.Run <- function(run) {

   # This is returning current (earliest) state, see bayroot#14
   #sampled.types <- sapply(run$get.compartments(), function(x) x$get.type()$get.name())
-  idx <- sapply(names(fixed.sampl), function(x) min(which(events$compartment1==x)))
+  idx <- sapply(names(fixed.sampl), function(x) {
+    min(which(events$compartment1==x | events$compartment2==x)) 
+    })
   sampled.types <- events$type1[idx]

ArtPoon commented 2 years ago

yet another problem, I just noticed that trees have some negative branch lengths, e.g., Activecomp1_3:

ArtPoon commented 2 years ago

Looks like this is caused by discordance between transmission time and sampling time

ArtPoon commented 2 years ago

> events[events$compartment1=='Activecomp3_5',]
     event.type     time lineage1 lineage2  compartment1 compartment2  type1  type2
11 transmission 11.80772       NA       NA Activecomp3_5  US_Active_9 Active Active
> events[events$compartment1=='US_Active_9',]
     event.type     time lineage1 lineage2 compartment1  compartment2  type1  type2
33 transmission 16.15517       NA       NA  US_Active_9 Activecomp1_3 Active Active
> events[events$compartment1=='Activecomp1_3',]
     event.type     time lineage1 lineage2  compartment1 compartment2  type1  type2
80 transmission 17.14614       NA       NA Activecomp1_3 US_Active_73 Active Active
> head(fixed.sampl)
Activecomp1_1 Activecomp1_2 Activecomp1_3 Activecomp1_4 Activecomp1_5 Activecomp1_6 
           17            17            17            17            17            17

Okay so Activecomp_1_3 transmits to US_Active_9 after it is sampled. This is acceptable, but its parent should be Activecomp1_3__US_Active_73, not Activecomp_1_3__US_Active_9

ArtPoon commented 2 years ago

Potential fix:

     parent=events$node.name[
-      sapply(names(fixed.sampl), function(comp) {
-        min(which(events$compartment1==comp))
+      sapply(1:length(fixed.sampl), function(i) {
+        comp <- names(fixed.sampl)[i]
+        rows <- which((events$compartment1==comp | events$compartment2==comp) & 
+                        events$time > fixed.sampl[i])
+        return(min(rows))  # most recent node is parent
         })]

ArtPoon commented 2 years ago

Having problems getting INDELible to recognize the Newick tree string, going to give AliSim a try

ArtPoon commented 2 years ago

art@Wernstrom bayroot % iqtree2 --alisim data/latent1.cens.1.alisim.fa -t data/latent1.cens.1.nwk -m TN+F{0.4/0.2/0.2/0.2} --root-seq AY772699.txt,AY772699 -af fasta --seqtype DNA
IQ-TREE multicore version 2.2.0.4 COVID-edition for Mac OS X 64-bit built May 31 2022
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams, Ly Trong Nhan.

Host:    Wernstrom.local (AVX2, FMA3, 16 GB RAM)
Command: iqtree2 --alisim data/latent1.cens.1.alisim.fa -t data/latent1.cens.1.nwk -m TN+F{0.4/0.2/0.2/0.2} --root-seq AY772699.txt,AY772699 -af fasta --seqtype DNA
Seed:    613729 (Using SPRNG - Scalable Parallel Random Number Generator)
Time:    Tue May 31 15:35:01 2022
Kernel:  AVX+FMA - 1 threads (16 CPU cores detected)

[Alignment Simulator] Executing
WARNING: Without Inference Mode, we strongly recommend users to specify model parameters for more accuracy simulations. Users could use <Model_Name>{<param_0>/.../<param_n>} to specify the model parameters. For the model TN, users should provide 2 params (see User Manuals).
 - Tree filepath: data/latent1.cens.1.nwk
 - Length of output sequences: 1000
 - Model: TN+F{0.4/0.2/0.2/0.2}
 - Number of output datasets: 1
 - Ancestral sequence position: 1
Fasta format detected
Reading fasta file: done in 4.29153e-05 secs using 72.24% CPU
WARNING: Sequence length is now set equally to the length of ancestral sequence.
WARNING: Using an ancestral sequence with base frequencies that are not compatible with the specification of the model may lead to unexpected results.
An alignment has just been exported to data/latent1.cens.1.alisim.fa.fa
SUBSTITUTION PROCESS
--------------------

Model of substitution: TN+FU

Rate parameter R:

  A-C: 1.000
  A-G: 1.000
  A-T: 1.000
  C-G: 1.000
  C-T: 1.000
  G-T: 1.000

State frequencies: (user-defined)

  pi(A) = 0.4
  pi(C) = 0.2
  pi(G) = 0.2
  pi(T) = 0.2

Rate matrix Q:

  A    -0.833     0.278     0.278     0.278
  C     0.556     -1.11     0.278     0.278
  G     0.556     0.278     -1.11     0.278
  T     0.556     0.278     0.278     -1.11

Model of rate heterogeneity: Uniform

[Alignment Simulator] Done
Simulation time: 0.0293s
Date and Time: Tue May 31 15:35:01 2022

Resulting alignment is crazy because branch lengths are scaled in time units (too long). Attempting to rescale branch lengths causes this program to crash:

art@Wernstrom bayroot % iqtree2 --alisim data/latent1.cens.1.alisim.fa -t data/latent1.cens.1.nwk -m TN+F{0.4/0.2/0.2/0.2} --root-seq AY772699.txt,AY772699 -af fasta --seqtype DNA --branch-scale 1
IQ-TREE multicore version 2.2.0.4 COVID-edition for Mac OS X 64-bit built May 31 2022
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams, Ly Trong Nhan.

Host:    Wernstrom.local (AVX2, FMA3, 16 GB RAM)
Command: iqtree2 --alisim data/latent1.cens.1.alisim.fa -t data/latent1.cens.1.nwk -m TN+F{0.4/0.2/0.2/0.2} --root-seq AY772699.txt,AY772699 -af fasta --seqtype DNA --branch-scale 1
Seed:    521331 (Using SPRNG - Scalable Parallel Random Number Generator)
Time:    Tue May 31 15:37:55 2022
Kernel:  AVX+FMA - 1 threads (16 CPU cores detected)

[Alignment Simulator] Executing
WARNING: Without Inference Mode, we strongly recommend users to specify model parameters for more accuracy simulations. Users could use <Model_Name>{<param_0>/.../<param_n>} to specify the model parameters. For the model TN, users should provide 2 params (see User Manuals).
libc++abi.dylib: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
ERROR: STACK TRACE FOR DEBUGGING:
ERROR: 2   _sigtramp()
ERROR: 6   demangling_terminate_handler()
ERROR: 7   _objc_terminate()
ERROR: 8   std::__terminate(void (*)())
ERROR: 9   __cxa_get_exception_ptr()
ERROR: 10   __cxxabiv1::exception_cleanup_func(_Unwind_Reason_Code, _Unwind_Exception*)
ERROR: 11   operator new(unsigned long, std::nothrow_t const&)
ERROR: 12   Alignment::orderPatternByNumChars(int)
ERROR: 13   IQTree::initializeModel(Params&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, ModelsBlock*)
ERROR: 14   AliSimulator::initializeModel(IQTree*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)
ERROR: 15   AliSimulator::initializeIQTreeFromTreeFile()
ERROR: 16   AliSimulator::AliSimulator(Params*, int, double)
ERROR: 17   executeSimulation(Params, IQTree*&)
ERROR: 18   runAliSim(Params&, Checkpoint*)
ERROR: 
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: ***    Log file: data/latent1.cens.1.nwk.log
ERROR: ***    Alignment files (if possible)
zsh: abort      iqtree2 --alisim data/latent1.cens.1.alisim.fa -t data/latent1.cens.1.nwk -m

ArtPoon commented 2 years ago

I ended up just doing this with INDELible, revised the Python script:

art@Wernstrom bayroot % python3 indelible.py data/latent1.1.cens.nwk
set control file to /var/folders/56/3y9v479n0g55nm_b_51dvv_m0000gn/T/tmpgd4exb9q

 INDELible V1.03 by Will Fletcher: Simulation began at Thu Jun  2 11:29:04 2022

  * Block 1 completed.   Time taken: 0.020661 seconds.

 *** SIMULATION COMPLETED - PLEASE CONSULT OUTPUT FILES ***                                                                     

 INDELible V1.03 Simulations completed at: Thu Jun  2 11:29:04 2022

art@Wernstrom bayroot % fasttree2 -nt -gtr < data/latent1.1.cens.nwk_TRUE.fas > data/latent1.1.ft2.nwk
FastTree Version 2.1.11 Double precision (No SSE3)
Alignment: standard input
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Generalized Time-Reversible, CAT approximation with 20 rate categories
Initial topology in 0.01 seconds
Refining topology: 21 rounds ME-NNIs, 2 rounds ME-SPRs, 10 rounds ML-NNIs
Total branch-length 1.000 after 0.15 sec1, 1 of 36 splits   
ML-NNI round 1: LogLk = -8648.805 NNIs 5 max delta 6.25 Time 0.22
GTR Frequencies: 0.3796 0.1778 0.2219 0.2207ep 8 of 12   
GTR rates(ac ag at cg ct gt) 0.7861 6.1829 0.8946 0.8183 3.4474 1.0000
Switched to using 20 rate categories (CAT approximation)
Rate categories were divided by 0.776 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
Use -gamma for approximate but comparable Gamma(20) log-likelihoods
ML-NNI round 2: LogLk = -7720.290 NNIs 2 max delta 0.30 Time 0.50
ML-NNI round 3: LogLk = -7719.933 NNIs 1 max delta 0.24 Time 0.53
ML-NNI round 4: LogLk = -7717.692 NNIs 4 max delta 2.19 Time 0.56
ML-NNI round 5: LogLk = -7717.687 NNIs 1 max delta 0.00 Time 0.57
Turning off heuristics for final round of ML NNIs (converged)
ML-NNI round 6: LogLk = -7717.680 NNIs 0 max delta 0.00 Time 0.65 (final)
Optimize all lengths: LogLk = -7717.680 Time 0.66
Total time: 0.78 seconds Unique: 38/40 Bad splits: 0/35

ArtPoon commented 2 years ago

RMSE values:

> results
                               filename      rtt   bayroot
1   data/latent1.1.cens.nwk.fas.ft2.nwk 8.810878 1.2405863
2  data/latent1.10.cens.nwk.fas.ft2.nwk 7.729554 2.0536460
3   data/latent1.2.cens.nwk.fas.ft2.nwk 8.846972 1.5444688
4   data/latent1.3.cens.nwk.fas.ft2.nwk 8.669658 1.3696750
5   data/latent1.4.cens.nwk.fas.ft2.nwk 8.734025 0.7982607
6   data/latent1.5.cens.nwk.fas.ft2.nwk 9.043352 2.1381707
7   data/latent1.6.cens.nwk.fas.ft2.nwk 8.750281 1.8437518
8   data/latent1.7.cens.nwk.fas.ft2.nwk 9.234201 1.8388470
9   data/latent1.8.cens.nwk.fas.ft2.nwk 7.711286 2.1555422
10  data/latent1.9.cens.nwk.fas.ft2.nwk 9.536932 2.1050072

ArtPoon commented 2 years ago

Something funny happened with tip labels in simulation script.

ArtPoon commented 2 years ago

There were problems with configuring the prior and proposal settings, roll into a new issue

PoonLab / bayroot

Simulating trees/sequences for testing #14