citiususc / veryfasttree

Efficient phylogenetic tree inference for massive taxonomic datasets
Other
111 stars 5 forks source link

Some species missing in the final tree #19

Closed chtsai0105 closed 1 year ago

chtsai0105 commented 1 year ago

Hi - I'm using the latest version of VeryFastTree v4.0.3 to build a tree upon a concatenated fasta comprises 5 species. Since these are DNA coding sequences, I ran it with -nt and the full cmd can be found in the log below.

Although in the last second line it reported that 5 unique species have been processed, I found only 3 speices have been reported in the final newick tree:

Command: VeryFastTree -nt -gamma -threads 10 concat_alignments.mfa
VeryFastTree Version 4.0.3 (OpenMP, SSE) with SSE3 using threads(10) level 3 deterministic
Alignment: concat_alignments.mfa
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jukes-Cantor, CAT approximation with 20 rate categories
Initial topology in 0.37 seconds
Refining topology: 9 rounds ME-NNIs, 2 rounds ME-SPRs, 5 rounds ML-NNIs
Total branch-length 0.849 after 0.39 sec 3 splits
ML-NNI round 1: LogLk = -1345542.878 NNIs 0 max delta 0.00 Time 0.99
Switched to using 20 rate categories (CAT approximation)20 of 20
Rate categories were divided by 0.676 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
ML-NNI round 2: LogLk = -1233085.017 NNIs 0 max delta 0.00 Time 3.13
Turning off heuristics for final round of ML NNIs (converged)
ML-NNI round 3: LogLk = -1233085.017 NNIs 0 max delta 0.00 Time 3.18 (final)
Optimize all lengths: LogLk = -1232969.453 Time 3.79
Gamma(20) LogLk = -1343596.116 alpha = 1.565 rescaling lengths by 1.305
Total time: 89.76 seconds Unique: 5/5 Bad splits: 0/0
(Actinomucor_elegans_CBS_100.09:0.31546,Pilobolus_umbonatus_NRRL_6349:0.37103,Rhizopus_homothallicus_CBS_336.62:0.42758);
TreeCompleted

I also ran on the original version of FastTree with quite equivalent cmd FastTree -nt -gamma concat_alignments.mfa and it was able to report all the species:

FastTree Version 2.1.11 SSE3
Alignment: concat_alignments.mfa
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jukes-Cantor, CAT approximation with 20 rate categories
Initial topology in 0.36 seconds
Refining topology: 9 rounds ME-NNIs, 2 rounds ME-SPRs, 5 rounds ML-NNIs
Total branch-length 1.259 after 3.34 sec 1 of 3 splits
ML-NNI round 1: LogLk = -5314636.951 NNIs 0 max delta 0.00 Time 9.34
Switched to using 20 rate categories (CAT approximation)20 of 20
Rate categories were divided by 0.704 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
ML-NNI round 2: LogLk = -4989993.675 NNIs 0 max delta 0.00 Time 12.36
Turning off heuristics for final round of ML NNIs (converged)
ML-NNI round 3: LogLk = -4989815.150 NNIs 0 max delta 0.00 Time 16.33 (final)
Optimize all lengths: LogLk = -4989804.848 Time 18.29
Gamma(20) LogLk = -5272812.729 alpha = 0.676 rescaling lengths by 1.715
Total time: 123.20 seconds Unique: 5/5 Bad splits: 0/2
(Pilobolus_umbonatus_NRRL_6349:0.55510,(Rhizopus_homothallicus_CBS_336.62:0.40002,Rhizopus_rhizopodiformis_NRRL_2570:0.32793)1.000:0.21658,(Actinomucor_elegans_CBS_100.09:0.37684,Zygorhynchus_heterogamous_NRRL_1489:0.42264)1.000:0.12535);

Here is the zip file of the concatenated fasta. concat_alignments.zip

cesarpomar commented 12 months ago

The issue has already been resolved in master branch. If it hasn't been, please reopen the issue.

chtsai0105 commented 11 months ago

Hi - sorry for the late reply and thanks for the fix. I compiled from the source and was able to get back the missing species. Do you plan to publish a new release on this? I'm working on a tool and want to manage all my dependencies through pip/conda.

cesarpomar commented 10 months ago

Hi - sorry for the late reply and thanks for the fix. I compiled from the source and was able to get back the missing species. Do you plan to publish a new release on this? I'm working on a tool and want to manage all my dependencies through pip/conda.

yes, this will be present in 4.0.4 release. Thank for using VeryFastTree.