beast-dev / beast-mcmc

Bayesian Evolutionary Analysis Sampling Trees
http://beast.community
GNU Lesser General Public License v2.1
188 stars 72 forks source link

TreeAnnotator v1.10.5 very slow in setting node heights when generating MCC tree #1156

Closed ViralVerity closed 5 months ago

ViralVerity commented 1 year ago

In v1.10.5 as compared to v1.10.4, the setting node heights step is very slow in generating the MCC tree.

When comparing them in a dataset with ~1000 sequences and 10,500 trees in the posterior, the whole process was completed using v1.10.4 in about five minutes. In comparison, in v1.10.5 after seven minutes of setting node heights, it was only approximately a third of the way through.

I just use the older treeannotator for the most part, but thought it was worth flagging!

liamxg commented 1 year ago

@ViralVerity agree.

rambaut commented 1 year ago

The biggest difference I see is using the -heights ca option vs. the default mean heights. Testing with 1.10.4 suggests same time differential. Could the use of -heights ca differ between the two runs?

ViralVerity commented 1 year ago

I just ran it again with properly measuring the time and with a different dataset so I could share it - it's now 18000 trees of 522 sequences.

I haven't changed the -heights option, only changed the jar files (I've attached both of those - "old_beast.jar" is the 1.10.4 and "beast.jar" is 1.10.5) and it goes from:

User: 615s system: 9.54s total: 6:43.10

to: User: 21898.79s system: 66.06s total: 1:01:43.56

old_beast.jar.zip beast.jar.zip

Command for both but with paths/jar files changed: java -Xms64m -Xmx4096m -Djava.library.path="$HOME/Desktop/old_beast/lib" -cp $HOME/Desktop/old_beast/lib/old_beast.jar dr.app.tools.TreeAnnotator ~/GLab\ Dropbox/GLab_team/Projects/2022_EEEV/results/final_beast_runs/DTA/combined_DTA_all_states.trees ~/test_old_beast.mcc

I can't upload the .trees file (even compressed) because it's too big, but I can email it if it's helpful!

rambaut commented 5 months ago

The default for node heights has changed from mean to CA. So I think the previous version was not doing the 'setting node heights' stage.

rambaut commented 5 months ago

Closing this as I believe the difference in speed is due to -heights ca becoming the default behaviour.

liamxg commented 5 months ago

CA is better than mean?