bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1.06k stars 182 forks source link

Need help with Diamond - Megan #619

Open SowmyaPulapet opened 2 years ago

SowmyaPulapet commented 2 years ago

Hi, I am using diamond version 2.0.15 in Megan6 tool generic pipeline for metagenome analysis. The pipeline uses diamond to perform blast and make .daa files. I have 2 samples and already generated a .megan file for one of them. However, the next sample had been stuck on the step of .daa file generation for a while now. The command used is :

~/Tools/diamond/bin/diamond blastx --query ../00fastq/sample.fq.gz --db nr --daa ./sample.daa

The sample file size is ~500 mb. The log file shows the following message:

Masking low complexity seeds… [0.305s] Searching alignments… [75.809s] Deallocating buffers… [0.564s] Clearing query masking… [7.9s] Opening temporary output file… [0s] Computing alignments… [2472.54s] Deallocating reference… [0.333s] Loading reference sequences… [16.142s] Masking reference…

I understand this means the process is going on but the slow progress concerns me as it has been more than 2 weeks where the other sample of the same size took only a day. I don’t understand if it has stopped or not. There is a .daa file created in the sample name but is empty. The nr database was created using the same version of diamond. Hope you could help me in this case.

Thank you.

bbuchfink commented 2 years ago

This clearly looks like an error has occurred here. Have you tried to rerun the sample? You can also try using --masking 0. If the error persists, it would be very helpful if you could send me your query file to reproduce it.

SowmyaPulapet commented 2 years ago

Hi @bbuchfink Yes, I tried rerunning but still it gets stuck. I will try with--masking 0 and update you.

LimesKey commented 1 year ago

I'm having a similar problem and --masking 0 worked for me. I'm using GitHub Codespace's 32GB RAM plan. The memory appeared to be close to being maxed

@LimesKey ➜ /workspaces/diamond (master ✗) $ ./diamond blastx -q GRCh38_latest_genomic.fna -d eyecolorprotein_Database.dmnd -o out.tsv --threads 16
diamond v2.0.14.152 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 16
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: 
#Target sequences to report alignments for: 25
Opening the database...  [0s]
Database: eyecolorprotein_Database.dmnd (type: Diamond database, sequences: 29, letters: 23128)
Block size = 2000000000
Opening the input file...  [0.027s]
Opening the output file...  [0s]
Loading query sequences...  [17.925s]
Masking queries... Killed
bbuchfink commented 1 year ago

Ok thanks for reporting this, I need to look into the memory use there.