bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1.03k stars 183 forks source link

Diamond runtime error #650

Open ko519 opened 1 year ago

ko519 commented 1 year ago

I am running a local instance of diamond blastx against some metagenome sequences, encountered a runtime error I've never seen before, checked the database with diamond dbinfo and all looks good. There is also plenty of space on the disk so that shouldn't be the issue...

Error:

Building reference seed array... [2.17s] Building query seed array... [1.545s] Computing hash join... [0.907s] Masking low complexity seeds... [0.351s] Searching alignments... [6.311s] Deallocating buffers... [0.45s] Clearing query masking... [0.583s] Opening temporary output file... [0s] Computing alignments... terminate called after throwing an instance of 'std::runtime_error' what(): Mismatching hit count / possibly corrupted temporary file: diamond-tmp-ZNcBiA

Any ideas?

bbuchfink commented 1 year ago

Have you tried to repeat the computation, does the error happen frequently? Can you show me your command line and diamond version?

ko519 commented 1 year ago

Hi!,

Yes I've ran a few times now with no success, it does however work on subsamples so I'm thinking it could potentially be a RAM issue...

Command line: for cat in /data/*.fastq.gz; do basefile=$(basename $cat ".fastq.gz"); daafile=$basefile.daa; diamond blastx -d diamond_nr2.dmnd -q $cat -o $daafile -f 100 &> $basefile.output.log; done &

Diamond version: 2.0.15

I can send you an example file if you need one.

bbuchfink commented 1 year ago

An example file that produces the error would be helpful. My email is buchfink@gmail.com.

bbuchfink commented 1 year ago

I've been aligning your file against the nr database, it's still running but no error so far. It may not be reproducible on my system. One thing you could try is to split your file into multiple chunks. It is possible to combine DAA files according to this: https://megan.cs.uni-tuebingen.de/t/working-with-very-large-files/2005

ko519 commented 1 year ago

Hello,

I had no idea I could do that, thanks a lot will give it a try.

On Mon, 13 Feb 2023 at 14:08, Benjamin Buchfink @.***> wrote:

I've been aligning your file against the nr database, it's still running but no error so far. It may not be reproducible on my system. One thing you could try is to split your file into multiple chunks. It is possible to combine DAA files according to this: https://megan.cs.uni-tuebingen.de/t/working-with-very-large-files/2005

— Reply to this email directly, view it on GitHub https://github.com/bbuchfink/diamond/issues/650#issuecomment-1427999664, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPPQDIEFFLOXMDQSRKJVYTWXI56RANCNFSM6AAAAAAUI36UBA . You are receiving this because you modified the open/close state.Message ID: @.***>

-- Katie O'Brien

bbuchfink commented 1 year ago

FYI the job finished for me after 5 days producing 140 GB of DAA output.

ko519 commented 1 year ago

Hi Benjamin,

Thanks for letting me know, must be a memory issue on my end! Appreciate the time you spent assisting me.

On Fri, 17 Feb 2023, 14:21 Benjamin Buchfink, @.***> wrote:

FYI the job finished for me after 5 days producing 140 GB of DAA output.

— Reply to this email directly, view it on GitHub https://github.com/bbuchfink/diamond/issues/650#issuecomment-1434717042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPPQDMQPEUDZZW7DZFPNPDWX6CPLANCNFSM6AAAAAAUI36UBA . You are receiving this because you modified the open/close state.Message ID: @.***>

ko519 commented 1 year ago

Hi Benjamin,

Me again! I'm applying for some HPC funding to get this running, they have asked what specs your workstation has to get a sample ran in ~4 days. If you have a rough idea of what you have in terms of RAM etc. I would really appreciate it.

On Mon, 20 Feb 2023 at 19:45, Katie O'Brien @.***> wrote:

Hi Benjamin,

Thanks for letting me know, must be a memory issue on my end! Appreciate the time you spent assisting me.

On Fri, 17 Feb 2023, 14:21 Benjamin Buchfink, @.***> wrote:

FYI the job finished for me after 5 days producing 140 GB of DAA output.

— Reply to this email directly, view it on GitHub https://github.com/bbuchfink/diamond/issues/650#issuecomment-1434717042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPPQDMQPEUDZZW7DZFPNPDWX6CPLANCNFSM6AAAAAAUI36UBA . You are receiving this because you modified the open/close state.Message ID: @.***>

-- Katie O'Brien

bbuchfink commented 1 year ago

Hi Katie, I ran it using 64 cores on and AMD EPYC system with 2 TB of RAM. That much RAM is not needed however, diamond runs fine with 128 GB (less if need be). For better performance I recommend using -c1 and a higher block size like -b6.