Open ZiliaMR opened 2 years ago
To compute the query blocks, take the number of DNA letters in the input file * 2, divided by the block size (2000000000 in your case).
OMG, that means that in my case my analysis have 7.69 query blocks.
My library have = 51307345 sequences 150 pb = 7696101750 letters (pb), so, 7696101750 2 / 2000000000 = 7.69 query blocks
if my calculations are correct, my analysis will take about 6.9 days to process. right?
I will have to adjust some parameters to speed up the analysis process.
Yes seems correct. The easiest way to reduce runtime would to be used a smaller database if that works for you, e.g. the UniRef50 or annottree, see here: https://journals.asm.org/doi/full/10.1128/msystems.01408-21
To get diamond faster you can use -c1
and increase the block size e.g. -b4
or -b6
, but this will also increase memory use.
Using global ranking can also help e.g. -g300
combined with -f 6 qseqid sseqid evalue
to save time for traceback.
Thanks for your help.
I am running my analysis using: -c4
and -b8
, and it seems to have reduced the time to 1.8 days per library :).
I will see your suggestions too.
(i tried with -c1
, but I do not have sufficient memory, and I cannot apply for more memory this month at my institution :( ).
Thanks for your help.
Hello,
I am running diamond with these parameters:
and in the terminal I see this:
... Processing for query block 1 took about one day. And I see that it has just started with query block 2. My question is: How many queries block will there be in total? How can I know this?
I would appreciate your help with this question.
thanks in advance