GiantSpaceRobot / FindFungi

A pipeline for the identification of fungi in public metagenomics datasets
16 stars 15 forks source link

Multiple errors/core dumps with FindFungi #6

Closed wolfgangrumpf closed 5 years ago

wolfgangrumpf commented 5 years ago

While running the SLURM version of FindFungi, I continually see this type of error - and core dumps - in my results. There are really three things I'm worried about here:

  1. The segmentation fault in megablast
  2. the "Not BLASTing" lines
  3. The "cannot calculate Pearson Coefficient

Any suggestions?

Not BLASTing 98765, too few reads Not BLASTing 994086, too few reads Ending : Blasting Kraken : Fri Mar 1 01:23:59 EST 2019 Starting : Gather fasta : Fri Mar 1 01:23:59 EST 2019 Ending : Gather fasta : Fri Mar 1 01:29:47 EST 2019 Starting : BLAST against the genome : Fri Mar 1 01:29:47 EST 2019 /opt/FindFungi/0.23/FindFungi-v0.23.3/FindFungi-0.23.3.sh: line 164: 4597 Segmentation fault (core dumped) blastn -task megablast -query $Dir/Processing/ReadNames.$Taxid.fsa -db $BLAST_DB_Dir/Taxid-$Taxid -out $Dir/Results/BLAST_Processing/BLAST.$Taxid -evalue 1E-20 -num_threads 2 0 -outfmt 6 /opt/FindFungi/0.23/FindFungi-v0.23.3/FindFungi-0.23.3.sh: line 171: 6872 Segmentation fault (core dumped) blastn -task megablast -query $Dir/Processing/ReadNames.$Taxid.fsa -db $BLAST_DB_Dir/Taxid-$Taxid -out $Dir/Results/BLAST_Processing/BLAST.$Taxid -evalue 1E-20 -num_threads 2 0 -outfmt 6 Ending : BLAST against the genome : Fri Mar 1 01:34:14 EST 2019 Starting : Calculate Pearson coefficient : Fri Mar 1 01:34:14 EST 2019 The file OUTPUT/FindFungi/Results/BLAST_Processing/Skewness.BLAST.1126212 is empty. Cannot calculate Pearson Coefficient of Skewness The file OUTPUT/FindFungi/Results/BLAST_Processing/Skewness.BLAST.1081104 is empty. Cannot calculate Pearson Coefficient of Skewness The file OUTPUT/FindFungi/Results/BLAST_Processing/Skewness.BLAST.105984 is empty. Cannot calculate Pearson Coefficient of Skewness

GiantSpaceRobot commented 5 years ago

Hi there,

Points 2 and 3 are not errors, that is normal behaviour for the pipeline. Unfortunately I do not know much regarding SLURM and how it operates. Can you try running megablast with less cores/single core to see how it responds? Apologies for the delay in response.

Paul

astrophys commented 5 years ago

Hello,

Working with Wolfgang, it seems that some of his core dumps were generated by sequences such as :

>K00194:184:H35CTBBXY:8:1128:19867:18933_2:N:0:CCTTGATC+GATGGAGT
GTCCTTCATCGAGGGACCGTCTACGGTCTTCTGCGTTGCGGTGGTAGCGTGTACGGTGGTCATGAGACCCTCGACGATACCGAAGTTGTCGTTGATGACCTTAGCCAGCGGAGCGAGGCAGTTCGTGGTGCACGATGCGTTCGAAACGATC

Using the command :

blastn -task megablast -query x${SLURM_ARRAY_TASK_ID} -db /opt/FindFungi/0.23/db/FungalGenomeDatabases_EqualContigs//Taxid-573508 -out out${SLURM_ARRAY_TASK_ID} -evalue 1E-20 -num_threads 20 -outfmt 6

This failed with blast-2.2.31, but worked with blast-2.5.0. Perhaps a more recent version of blast is better to use? Thoughts?

GiantSpaceRobot commented 5 years ago

Hi there,

Thank you for the message, that is very strange behaviour. Perhaps blast-2.2.31 and SLURM don't get along that well. I'm glad you seem to have figured the problem out. Is the pipeline functioning as it should now?

Best, Paul

astrophys commented 5 years ago

These are unrelated to SLURM. I think that this is likely a bug on blast's part. We are still working on optimizations of the rest of the workflow.

wolfgangrumpf commented 5 years ago

We (meaning astrophys) figured it out - turns out there was an issue with BLAST versions prior to 2.8. No more segmentation faults now!