Open KJ-Ma opened 7 months ago
Hello
I have a large protein sequence file as below, sum_len 10,885,629,915 bp.
> file format type num_seqs sum_len min_len avg_len max_len > non_redundancy_protein.fasta FASTA Protein 56,324,313 10,885,629,915 34 193.3 14,951
I use diamond to blastp with NCBI NR database as below:
nohup diamond blastp -d nr_20230728.dmnd -q ../07rm_redundancy/07partial_cdhit2/non_redundancy_protein.fasta --outfmt 6 --max-target-seqs 5 -e 1e-10 --query-cover 80 --id 50 --threads 140 -c 1 -b 16 -o diamond_annotation_nr.tsv > diamond_log.txt 2>&1 &
It seems diamond need too mang time to finish it, I'd like to know How mang query block will this command run?
I would appreciate your help with this question.
nohup: ignoring input diamond v2.1.8.162 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021) #CPU threads: 140 Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: #Target sequences to report alignments for: 5 Opening the database... [0.074s] Database: /home/adm/database/NCBI/NCBI_NR/nr_20230728.dmnd (type: Diamond database, sequences: 595907626, letters: 234169316349) Block size = 16000000000 Opening the input file... [0.034s] Opening the output file... [0s] Loading query sequences... [56.861s] Masking queries... [10.58s] Algorithm: Double-indexed Building query histograms... [7.472s] Seeking in database... [0s] Loading reference sequences... [30.694s] Masking reference... [17.357s] Initializing dictionary... [0.075s] Initializing temporary storage... [0s] Building reference histograms... [10.244s] Allocating buffers... [0.001s] Processing query block 1, reference block 1/15, shape 1/2. Building reference seed array... [6.012s] Building query seed array... [5.681s] Computing hash join... [20.388s] Masking low complexity seeds... [3.321s] Searching alignments... [1395.4s] Deallocating memory... [0s] Processing query block 1, reference block 1/15, shape 2/2. Building reference seed array... [4.54s] Building query seed array... [6.068s] Computing hash join... [31.181s] Masking low complexity seeds... [2.292s] Searching alignments... [1199.63s] Deallocating memory... [0s] Deallocating buffers... [9.142s] Clearing query masking... [3.581s] Opening temporary output file... [0s] Computing alignments... Loading trace points... [353.293s] Sorting trace points... [98.201s] Computing alignments... [1444.16s] Deallocating buffers... [20.527s] Loading trace points... [0.014s] Sorting trace points... [108.536s] Computing alignments... [1457.22s] Deallocating buffers... [31.078s] Loading trace points... [0.036s] Sorting trace points... [83.7s] Computing alignments... [1138.63s] Deallocating buffers... [11.271s] Loading trace points... [0.036s] Sorting trace points... [101.461s] Computing alignments... [1432.87s] Deallocating buffers... [21.436s] Loading trace points... [0.047s] Sorting trace points... [103.237s] Computing alignments... [1348.66s] Deallocating buffers... [21.272s] Loading trace points... [0.007s] Sorting trace points... [127.472s] Computing alignments... [1707.93s] Deallocating buffers... [24.559s] Loading trace points... [0.034s] Sorting trace points... [117.072s] Computing alignments... [1555.41s] Deallocating buffers... [19.418s] Loading trace points... [0.049s] Sorting trace points... [122.935s] Computing alignments... [1554.81s] Deallocating buffers... [23.619s] Loading trace points... [0.023s] Sorting trace points... [109.928s] Computing alignments... [1468.24s] Deallocating buffers... [19.654s] Loading trace points... [0.032s] Sorting trace points... [106.685s] Computing alignments... [1403.99s] Deallocating buffers... [22.997s] Loading trace points... [0.049s] Sorting trace points... [105.344s] Computing alignments... [1378.31s] Deallocating buffers... [17.975s] Loading trace points... [0.041s] Sorting trace points... [99.973s] Computing alignments... [1339.73s] Deallocating buffers... [15.575s] Loading trace points... [0.006s] Sorting trace points... [110.233s] Computing alignments... [1421.66s] Deallocating buffers... [25.309s] Loading trace points... [0.03s] Sorting trace points... [99.521s] Computing alignments... [1433.63s] Deallocating buffers... [17.191s] Loading trace points... [0.01s] Sorting trace points... [87.972s] Computing alignments... [1277.04s] Deallocating buffers... [12.884s] Loading trace points... [0s] Sorting trace points... [120.664s] Computing alignments... [1293.89s] Deallocating buffers... [22.838s] Loading trace points... [0s] [25040.5s] Deallocating reference... [0.069s] Loading reference sequences... [33.603s] Masking reference... [16.284s] Initializing dictionary... [0.077s] Initializing temporary storage... [0.01s] Building reference histograms... [10.543s] Allocating buffers... [0.001s] Processing query block 1, reference block 2/15, shape 1/2. Building reference seed array... [6.667s] Building query seed array... [6.038s] Computing hash join... [22.306s] Masking low complexity seeds... [2.497s] Searching alignments... [1417.19s] Deallocating memory... [0s] Processing query block 1, reference block 2/15, shape 2/2. Building reference seed array... [4.602s] Building query seed array... [3.34s] Computing hash join... [66.474s] Masking low complexity seeds... [2.595s] Searching alignments... [1208.88s] Deallocating memory... [0s] Deallocating buffers... [1.733s] Clearing query masking... [3.206s] Opening temporary output file... [0s] Computing alignments... Loading trace points... [360.445s] Sorting trace points... [122.79s] Computing alignments... [1581.5s] Deallocating buffers... [22.937s] Loading trace points... [0.038s] Sorting trace points... [131.016s] Computing alignments... [1712.84s]
Your block size is 16 GB and the query file is ~11 GB, so it will be one query block.
Thanks !
Hello
I have a large protein sequence file as below, sum_len 10,885,629,915 bp.
I use diamond to blastp with NCBI NR database as below:
nohup diamond blastp -d nr_20230728.dmnd -q ../07rm_redundancy/07partial_cdhit2/non_redundancy_protein.fasta --outfmt 6 --max-target-seqs 5 -e 1e-10 --query-cover 80 --id 50 --threads 140 -c 1 -b 16 -o diamond_annotation_nr.tsv > diamond_log.txt 2>&1 &
It seems diamond need too mang time to finish it, I'd like to know How mang query block will this command run?
I would appreciate your help with this question.