bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1.07k stars 182 forks source link

--global-ranking, impact and when to set lower? #814

Open fpusan opened 5 months ago

fpusan commented 5 months ago

Hi and thanks for the continued development and maintenance of DIAMOND.

I'm interested in speeding up searches against NCBI nr. Ideally I would like to get only the top 10-20 hits, even if they are not very good.

I've seen here that I can use the --fast mode, which afaik is optimized for hits with >90% identity. I have also seen you mention that -g100 can speed things up if you only need the best hits for each query. I didn't fully undertood what it did exactly, or whether it came with a lower sensitivity against low identity hits.

Could you elaborate a bit on when to set -g lower than the default and the expected consecuences of doing so?

Thanks in advance

bbuchfink commented 1 month ago

-g N enables global ranking which means only the best N targets will be extended for each query, ranked on their ungapped extension score. This can be much faster than the default behaviour for large databases, but it is also greedy and relevant hits can easily be missed. For your application is certainly seems to make sense using it.