Closed eernst closed 4 years ago
Mine is also getting stuck when specifying multiple threads. I tried the following simple alignment:
When I run VeryFastTree single-threaded as follows, it runs just fine:
$ cat test.aln.txt | VeryFastTree -gamma -nt -gtr -double-precision -fastexp 2
Command: VeryFastTree -gamma -nt -gtr -double-precision -fastexp 2
VeryFastTree Version 3.0 (OpenMP, AVX2)
Alignment: standard input
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Generalized Time-Reversible, CAT approximation with 20 rate categories
Initial topology in 0.00 seconds
Refining topology: 11 rounds ME-NNIs, 2 rounds ME-SPRs, 6 rounds ML-NNIs
Total branch-length 0.140 after 0.00 sec
ML-NNI round 1: LogLk = -542.341 NNIs 0 max delta 0.00 Time 0.00
GTR Frequencies: 0.2480 0.2417 0.2058 0.3044
GTR rates(ac ag at cg ct gt) 1.0260 1.8459 0.8277 0.6470 2.9109 1.0000
Switched to using 20 rate categories (CAT approximation)
Rate categories were divided by 0.654 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
ML-NNI round 2: LogLk = -523.940 NNIs 0 max delta 0.00 Time 0.03
Turning off heuristics for final round of ML NNIs (converged)
ML-NNI round 3: LogLk = -523.940 NNIs 0 max delta 0.00 Time 0.04 (final)
Optimize all lengths: LogLk = -523.940 Time 0.04
Gamma(20) LogLk = -534.502 alpha = 1.284 rescaling lengths by 1.042
Total time: 0.05 seconds Unique: 7/8 Bad splits: 0/4
((BetaCoV_Wuhan_IVDC_HB_01_2019_EPI_ISL_402119:0.0,BetaCoV_Wuhan_IVDC_HB_04_2020_EPI_ISL_402120:0.0):0.004282957,BetaCoV_Wuhan_IPBCAMS_WH_01_2019_EPI_ISL_402123:0.000000006,(BetaCoV_Wuhan_IVDC_HB_05_2019_EPI_ISL_402121:0.000000006,(BetaCoV_Wuhan_WIV04_2019_EPI_ISL_402124:0.000000006,(MT072688_1:0.000000006,(BetaCoV_bat_Yunnan_RaTG13_2013_EPI_ISL_402131:0.049182114,BetaCoV_pangolin_Guangxi_P4L_2017_EPI_ISL_410538:0.091150889)0.924:0.000000006)0.993:0.000000006)0.858:0.004276744)0.852:0.004269224);
TreeCompleted
However, if I try to multithread, it hangs with only 1 thread used, maxed out (the only difference between the following command and the previous one is the -threads 2
at the end):
$ cat test.aln.txt | VeryFastTree -gamma -nt -gtr -double-precision -fastexp 2 -threads 2
Command: VeryFastTree -gamma -nt -gtr -double-precision -fastexp 2 -threads 2
VeryFastTree Version 3.0 (OpenMP, AVX2)
Alignment: standard input
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Generalized Time-Reversible, CAT approximation with 20 rate categories
Initial topology in 0.00 seconds
Refining topology: 11 rounds ME-NNIs, 2 rounds ME-SPRs, 6 rounds ML-NNIs
Total branch-length 0.008 after 0.00 sec
I tried it with multiple other values for -threads
and got the same results. My cmake
command when compiling was the following:
cmake -USE_NATIVE=ON -USE_SEE2=ON -USE_SEE4=ON -USE_AVX=ON -USE_AVX2=ON -USE_AVX512=ON .
Hello
I found two errors in the tool and made a commit, the argument -thread-subtrees was uninitialized, so the maximum number of subtrees per thread was not fixes, if this is a problem for you, you can set a large number in this parameter. Moreover, keep in mind that for small datasets rarely the tree can split into smaller trees, you can check the log, and the other parallel regions inherited from fastree only scale for 3-4 threads. The flag that it uses for this calculation was being ignored and the algorithm was stuck looking for a partitioning of the tree.
Check if the problem is solved and I will create a new release.
Thanks for using the tool and for your feedback
Thanks for looking into it! I tried using the same dataset I mentioned earlier (and I compiled and ran VeryFastTree the same way as before, just using the cloned repo rather than the release tarball), and now I get the following:
Command: VeryFastTree -gamma -nt -gtr -double-precision -fastexp 2 -threads 2
VeryFastTree Version 3.0 (OpenMP, AVX2)
Alignment: standard input
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Generalized Time-Reversible, CAT approximation with 20 rate categories
Initial topology in 0.00 seconds
Refining topology: 11 rounds ME-NNIs, 2 rounds ME-SPRs, 6 rounds ML-NNIs
Total branch-length 0.245 after 0.00 sec
The tree has 1 nodes and it was divided into 0 subtrees:
thread 0(0.00%):roots[], nodes 0
thread 1(0.00%):roots[], nodes 0
skipped (100.00%): nodes 1
total (100.00%), nodes 1, theoretical speedup 1.00 of 2 Time 0.01
ML-NNI round 1: LogLk = -261.997 NNIs 0 max delta 0.00 Time 0.01
The tree has 1 nodes and it was divided into 3 subtrees:
thread 0(0.00%):roots[0, 1, 2], nodes 0
thread 1(0.00%):roots[], nodes 0
skipped (100.00%): nodes 1
total (100.00%), nodes 1, theoretical speedup 1.00 of 2 Time 0.02
The tree has 1 nodes and it was divided into 3 subtrees:
thread 0(0.00%):roots[0, 1, 2], nodes 0
thread 1(0.00%):roots[], nodes 0
skipped (100.00%): nodes 1
total (100.00%), nodes 1, theoretical speedup 1.00 of 2 Time 0.03
...
[ it just keeps repeating like this seemingly infinitely ]
I was unable to reproduce the error, the log shows how in a small tree if we use more than one thread, all the nodes go to thread 1. I attach my log, check if I used the same params and if you can give me more details of the error. log.txt
@cesarpomar Apologies, it seems like it was just that my terminal console was getting flooded with the progress messages. I directed standard error to /dev/null
, and it completed quickly as expected
Mine seems to be fully-functioning. Thank you! :-)
With the latest commit, I'm seeing coredumps with some thread counts on the attached test alignment:
$ parallel -j 1 './VeryFastTree -threads {} -double-precision -ext AVX2 OG0008635.fa >/dev/null 2>&1; echo -e "Trying with -threads: {}\t"' ::: {1..16}
Trying with -threads: 1
Trying with -threads: 2
Trying with -threads: 3
Trying with -threads: 4
Trying with -threads: 5
Trying with -threads: 6
Trying with -threads: 7
Trying with -threads: 8
/usr/bin/bash: line 1: 2248788 Aborted (core dumped) ./VeryFastTree -threads 8 -double-precision -ext AVX2 OG0008635.fa > /dev/null 2>&1
Trying with -threads: 9
/usr/bin/bash: line 1: 2248807 Aborted (core dumped) ./VeryFastTree -threads 9 -double-precision -ext AVX2 OG0008635.fa > /dev/null 2>&1
Trying with -threads: 10
/usr/bin/bash: line 1: 2248824 Aborted (core dumped) ./VeryFastTree -threads 10 -double-precision -ext AVX2 OG0008635.fa > /dev/null 2>&1
Trying with -threads: 11
/usr/bin/bash: line 1: 2248844 Aborted (core dumped) ./VeryFastTree -threads 11 -double-precision -ext AVX2 OG0008635.fa > /dev/null 2>&1
Trying with -threads: 12
/usr/bin/bash: line 1: 2248865 Aborted (core dumped) ./VeryFastTree -threads 12 -double-precision -ext AVX2 OG0008635.fa > /dev/null 2>&1
Trying with -threads: 13
Trying with -threads: 14
Trying with -threads: 15
/usr/bin/bash: line 1: 2248917 Aborted (core dumped) ./VeryFastTree -threads 15 -double-precision -ext AVX2 OG0008635.fa > /dev/null 2>&1
Trying with -threads: 16
Maybe a race condition? -threads 9 fails 4/10 times, for example:
$ parallel -j 1 './VeryFastTree -threads 9 -double-precision -ext AVX2 OG0008635.fa >/dev/null 2>&1; echo -e "Trying with -threads: 9\t"' ::: {1..10}
Trying with -threads: 9 1
Trying with -threads: 9 2
/usr/bin/bash: line 1: 2249201 Aborted (core dumped) ./VeryFastTree -threads 9 -double-precision -ext AVX2 OG0008635.fa > /dev/null 2>&1
Trying with -threads: 9 3
Trying with -threads: 9 4
/usr/bin/bash: line 1: 2249229 Aborted (core dumped) ./VeryFastTree -threads 9 -double-precision -ext AVX2 OG0008635.fa > /dev/null 2>&1
Trying with -threads: 9 5
/usr/bin/bash: line 1: 2249246 Aborted (core dumped) ./VeryFastTree -threads 9 -double-precision -ext AVX2 OG0008635.fa > /dev/null 2>&1
Trying with -threads: 9 6
/usr/bin/bash: line 1: 2249264 Aborted (core dumped) ./VeryFastTree -threads 9 -double-precision -ext AVX2 OG0008635.fa > /dev/null 2>&1
Trying with -threads: 9 7
Trying with -threads: 9 8
Trying with -threads: 9 9
Trying with -threads: 9 10
Compiled on an Intel Xeon 8280 with GCC 5.5.0, configured with: cmake -DUSE_NATIVE=OFF -DUSE_SEE4=ON -DUSE_AVX=ON -DUSE_AVX2=ON -DCMAKE_INSTALL_PREFIX=$PWD .
I think I found the error, you can check if the last commit has solved it.
Thanks for your feedback
Looks good to me now, tested with 1-32 threads and 112 threads for good measure.
Thanks for the quick support!
Thanks for the hard work on this software! It seems VeryFastTree is working like a charm on most input, including very large alignments with over 1B characters. However, on some input it seems to be hanging with only one thread in use (maxed out) when calling with more than 15 threads (example alignment attached):
OG0008635.fa.zip