bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
994 stars 183 forks source link

About the running times #762

Open KJ-Ma opened 7 months ago

KJ-Ma commented 7 months ago

Hello

I have a large protein sequence file as below, sum_len 10,885,629,915 bp.

>  file                          format  type       num_seqs         sum_len  min_len  avg_len  max_len
> non_redundancy_protein.fasta  FASTA   Protein  56,324,313  10,885,629,915       34    193.3   14,951

I use diamond to blastp with NCBI NR database as below:

nohup diamond blastp -d nr_20230728.dmnd -q ../07rm_redundancy/07partial_cdhit2/non_redundancy_protein.fasta --outfmt 6 --max-target-seqs 5 -e 1e-10 --query-cover 80 --id 50 --threads 140 -c 1 -b 16 -o diamond_annotation_nr.tsv > diamond_log.txt 2>&1 &

It seems diamond need too mang time to finish it, I'd like to know How mang query block will this command run?

I would appreciate your help with this question.

nohup: ignoring input
diamond v2.1.8.162 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 140
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: 
#Target sequences to report alignments for: 5
Opening the database...  [0.074s]
Database: /home/adm/database/NCBI/NCBI_NR/nr_20230728.dmnd (type: Diamond database, sequences: 595907626, letters: 234169316349)
Block size = 16000000000
Opening the input file...  [0.034s]
Opening the output file...  [0s]
Loading query sequences...  [56.861s]
Masking queries...  [10.58s]
Algorithm: Double-indexed
Building query histograms...  [7.472s]
Seeking in database...  [0s]
Loading reference sequences...  [30.694s]
Masking reference...  [17.357s]
Initializing dictionary...  [0.075s]
Initializing temporary storage...  [0s]
Building reference histograms...  [10.244s]
Allocating buffers...  [0.001s]
Processing query block 1, reference block 1/15, shape 1/2.
Building reference seed array...  [6.012s]
Building query seed array...  [5.681s]
Computing hash join...  [20.388s]
Masking low complexity seeds...  [3.321s]
Searching alignments...  [1395.4s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/15, shape 2/2.
Building reference seed array...  [4.54s]
Building query seed array...  [6.068s]
Computing hash join...  [31.181s]
Masking low complexity seeds...  [2.292s]
Searching alignments...  [1199.63s]
Deallocating memory...  [0s]
Deallocating buffers...  [9.142s]
Clearing query masking...  [3.581s]
Opening temporary output file...  [0s]
Computing alignments... Loading trace points...  [353.293s]
Sorting trace points...  [98.201s]
Computing alignments...  [1444.16s]
Deallocating buffers...  [20.527s]
Loading trace points...  [0.014s]
Sorting trace points...  [108.536s]
Computing alignments...  [1457.22s]
Deallocating buffers...  [31.078s]
Loading trace points...  [0.036s]
Sorting trace points...  [83.7s]
Computing alignments...  [1138.63s]
Deallocating buffers...  [11.271s]
Loading trace points...  [0.036s]
Sorting trace points...  [101.461s]
Computing alignments...  [1432.87s]
Deallocating buffers...  [21.436s]
Loading trace points...  [0.047s]
Sorting trace points...  [103.237s]
Computing alignments...  [1348.66s]
Deallocating buffers...  [21.272s]
Loading trace points...  [0.007s]
Sorting trace points...  [127.472s]
Computing alignments...  [1707.93s]
Deallocating buffers...  [24.559s]
Loading trace points...  [0.034s]
Sorting trace points...  [117.072s]
Computing alignments...  [1555.41s]
Deallocating buffers...  [19.418s]
Loading trace points...  [0.049s]
Sorting trace points...  [122.935s]
Computing alignments...  [1554.81s]
Deallocating buffers...  [23.619s]
Loading trace points...  [0.023s]
Sorting trace points...  [109.928s]
Computing alignments...  [1468.24s]
Deallocating buffers...  [19.654s]
Loading trace points...  [0.032s]
Sorting trace points...  [106.685s]
Computing alignments...  [1403.99s]
Deallocating buffers...  [22.997s]
Loading trace points...  [0.049s]
Sorting trace points...  [105.344s]
Computing alignments...  [1378.31s]
Deallocating buffers...  [17.975s]
Loading trace points...  [0.041s]
Sorting trace points...  [99.973s]
Computing alignments...  [1339.73s]
Deallocating buffers...  [15.575s]
Loading trace points...  [0.006s]
Sorting trace points...  [110.233s]
Computing alignments...  [1421.66s]
Deallocating buffers...  [25.309s]
Loading trace points...  [0.03s]
Sorting trace points...  [99.521s]
Computing alignments...  [1433.63s]
Deallocating buffers...  [17.191s]
Loading trace points...  [0.01s]
Sorting trace points...  [87.972s]
Computing alignments...  [1277.04s]
Deallocating buffers...  [12.884s]
Loading trace points...  [0s]
Sorting trace points...  [120.664s]
Computing alignments...  [1293.89s]
Deallocating buffers...  [22.838s]
Loading trace points...  [0s]
 [25040.5s]
Deallocating reference...  [0.069s]
Loading reference sequences...  [33.603s]
Masking reference...  [16.284s]
Initializing dictionary...  [0.077s]
Initializing temporary storage...  [0.01s]
Building reference histograms...  [10.543s]
Allocating buffers...  [0.001s]
Processing query block 1, reference block 2/15, shape 1/2.
Building reference seed array...  [6.667s]
Building query seed array...  [6.038s]
Computing hash join...  [22.306s]
Masking low complexity seeds...  [2.497s]
Searching alignments...  [1417.19s]
Deallocating memory...  [0s]
Processing query block 1, reference block 2/15, shape 2/2.
Building reference seed array...  [4.602s]
Building query seed array...  [3.34s]
Computing hash join...  [66.474s]
Masking low complexity seeds...  [2.595s]
Searching alignments...  [1208.88s]
Deallocating memory...  [0s]
Deallocating buffers...  [1.733s]
Clearing query masking...  [3.206s]
Opening temporary output file...  [0s]
Computing alignments... Loading trace points...  [360.445s]
Sorting trace points...  [122.79s]
Computing alignments...  [1581.5s]
Deallocating buffers...  [22.937s]
Loading trace points...  [0.038s]
Sorting trace points...  [131.016s]
Computing alignments...  [1712.84s]
bbuchfink commented 7 months ago

Your block size is 16 GB and the query file is ~11 GB, so it will be one query block.

KJ-Ma commented 7 months ago

Thanks !