bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1.07k stars 182 forks source link

mmap() database? #216

Open sjaenick opened 6 years ago

sjaenick commented 6 years ago

Hi there,

any chance of having the database loaded into memory via mmap() so multiple diamond instances on a host could share the database?

bbuchfink commented 6 years ago

Hi,

it would be possible but I'm not sure if it would be that useful. I usually don't recommend running multiple instances of diamond on the same host because the program is more efficient if you assign more resources to a single task. Also as the files only contains the sequences but not the index, you wouldn't be able to save a lot of memory that way.

sjaenick commented 6 years ago

Our in-house pipeline submits diamond jobs to a DRMAA-based compute cluster, with input sequences split into chunks and one thread per job; multiple jobs being scheduled to the very same execution node would benefit, the same applies to subsequent runs as well as jobs submitted by different users (but targeting the same database) as the database might already be cached. So, this would rather affect scalability instead of efficiency.

Just a wishlist item with low priority ;-)

Keep up the good work, very much appreciated!

josemduarte commented 3 weeks ago

Coming back to this old thread. I'm wondering if as of 2024 the situation is the same? Are there any new features that make an mmap database possible?

My use-case is for a web service where users submit single sequence queries. I'd like those to be as fast as possible. Essentially something similar to what the mmseqs team did at https://github.com/soedinglab/MMseqs2-app

BTW thank you so much for such a wonderful piece of software and for making it open!