Cibiv / IQ-TREE

Efficient phylogenomic software by maximum likelihood
http://www.iqtree.org
GNU General Public License v2.0
184 stars 44 forks source link

Segmentation Fault for large dataset (but only with IQ-TREE2, not 1.6) #183

Open julibeg opened 3 years ago

julibeg commented 3 years ago

Hi there!

IQ-TREE2 segfaulted on me for a large file.

Top of logfile:

IQ-TREE multicore version 2.0.3 for Linux 64-bit built Dec 20 2020
Developed by Bui Quang Minh, Nguyen Lam Tung, Olga Chernomor,
Heiko Schmidt, Dominik Schrempf, Michael Woodhams.

Host:    plum-g1 (AVX2, FMA3, 1003 GB RAM)
Command: iqtree -s ethambutol_c5nonmajor.bcf.gz.snps.fa -m GTR+G+ASC -nt 3
Seed:    36003 (Using SPRNG - Scalable Parallel Random Number Generator)
Time:    Fri Jan  8 16:53:30 2021
Kernel:  AVX+FMA - 3 threads (96 CPU cores detected)

Reading alignment file ethambutol_c5nonmajor.bcf.gz.snps.fa ... Fasta format detected
Alignment most likely contains DNA/RNA sequences
Alignment has 16576 sequences with 151073 columns, 135949 distinct patterns
151073 parsimony-informative, 0 singleton sites, 0 constant sites

Bottom of logfile:

16576  SRR958234_SRR924709                                            0.01%    passed     99.88%
****  TOTAL                                                          0.38%  8 sequences failed composition chi2 test (p-value<5%; df=3)
NOTE: minimal branch length is reduced to 0.000000661932 for long alignment
NOTE: SRR2469374 is identical to ERR046833 but kept for subsequent analysis
NOTE: ERR400544 is identical to ERR2509677 but kept for subsequent analysis
NOTE: SRR5837708 is identical to SRR5709887 but kept for subsequent analysis
ERROR: STACK TRACE FOR DEBUGGING:
ERROR: 1   funcAbort()
ERROR: 2   ()
ERROR: 3   pllLoadAlignment()
ERROR: 4   IQTree::initializePLL(Params&)
ERROR: 5   startTreeReconstruction(Params&, IQTree*&, ModelCheckpoint&)
ERROR: 6   runPhyloAnalysis(Params&, Checkpoint*)
ERROR: 7   main()
ERROR: 8   __libc_start_main()
ERROR: 9   ()
ERROR: 
ERROR: *** IQ-TREE CRASHES WITH SIGNAL SEGMENTATION FAULT
ERROR: *** For bug report please send to developers:
ERROR: ***    Log file: ethambutol_c5nonmajor.bcf.gz.snps.fa.log
ERROR: ***    Alignment files (if possible)

I have also started the same file with v1.6.12 and it has gone past this stage without error (although has not finished the run yet). I can provide the alignment file if necessary.

bqminh commented 3 years ago

Hi there,

This is a known issue with the phylogenetic likelihood library (PLL) when dealing with large datasets like this.

Please use -t PARS option, which will overcome this issue. It switches to IQ-TREE own kernel for parsimony computation, instead of using PLL.

Thanks Minh

On 9 Jan 2021, at 4:49 am, julibeg notifications@github.com wrote:

Hi there!

IQ-TREE2 segfaulted on me for a large file.

Top of logfile:

IQ-TREE multicore version 2.0.3 for Linux 64-bit built Dec 20 2020 Developed by Bui Quang Minh, Nguyen Lam Tung, Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.

Host: plum-g1 (AVX2, FMA3, 1003 GB RAM) Command: iqtree -s ethambutol_c5nonmajor.bcf.gz.snps.fa -m GTR+G+ASC -nt 3 Seed: 36003 (Using SPRNG - Scalable Parallel Random Number Generator) Time: Fri Jan 8 16:53:30 2021 Kernel: AVX+FMA - 3 threads (96 CPU cores detected)

Reading alignment file ethambutol_c5nonmajor.bcf.gz.snps.fa ... Fasta format detected Alignment most likely contains DNA/RNA sequences Alignment has 16576 sequences with 151073 columns, 135949 distinct patterns 151073 parsimony-informative, 0 singleton sites, 0 constant sites Bottom of logfile:

16576 SRR958234_SRR924709 0.01% passed 99.88% * TOTAL 0.38% 8 sequences failed composition chi2 test (p-value<5%; df=3) NOTE: minimal branch length is reduced to 0.000000661932 for long alignment NOTE: SRR2469374 is identical to ERR046833 but kept for subsequent analysis NOTE: ERR400544 is identical to ERR2509677 but kept for subsequent analysis NOTE: SRR5837708 is identical to SRR5709887 but kept for subsequent analysis ERROR: STACK TRACE FOR DEBUGGING: ERROR: 1 funcAbort() ERROR: 2 () ERROR: 3 pllLoadAlignment() ERROR: 4 IQTree::initializePLL(Params&) ERROR: 5 startTreeReconstruction(Params&, IQTree&, ModelCheckpoint&) ERROR: 6 runPhyloAnalysis(Params&, Checkpoint*) ERROR: 7 main() ERROR: 8 __libc_start_main() ERROR: 9 () ERROR: ERROR: ** IQ-TREE CRASHES WITH SIGNAL SEGMENTATION FAULT ERROR: For bug report please send to developers: ERROR: Log file: ethambutol_c5nonmajor.bcf.gz.snps.fa.log ERROR: Alignment files (if possible) I have also started the same file with v1.6.12 and it has gone past this stage without error (although has not finished the run yet). I can provide the alignment file if necessary.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Cibiv/IQ-TREE/issues/183, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRTPUYFUJJ4BIW3HSOAFVLSY5ATBANCNFSM4V2YI4LQ.

julibeg commented 3 years ago

Thank you! Are there performance implications to be expected?

bqminh commented 3 years ago

In terms of speed IQ-TREE kernel is faster. But this is because it does not implement the SPR search, whereas PLL provides the SPR. So in terms of parsimony score, PLL is better. However, we don’t intend to benchmark this, because the parsimony tree only serves as a starting point for more thorough ML searches. It’s not critical, as long as the starting trees are reasonable.

Minh

On 29 Jan 2021, at 9:20 pm, julibeg notifications@github.com wrote:

Thank you! Are there performance implications to be expected?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Cibiv/IQ-TREE/issues/183#issuecomment-769716239, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRTPUYOGJKH5GHV4ZMDZJTS4KDWNANCNFSM4V2YI4LQ.