epruesse / SINA

SINA - Reference based multiple sequence alignment
https://sina.readthedocs.io
GNU General Public License v3.0
40 stars 4 forks source link

Test 1.4.0-rc parallel SINA pre-release #38

Closed epruesse closed 5 years ago

epruesse commented 5 years ago

Could any of you give the pre-release a spin before I push out the 1.4.0?

@mdehollander @rec3141 @Lagkouvardos @larusnz @a1an77 @ZarulSaurus @v-kisand @marcomeola

Adding multi-cpu support was a bigger change, so while I'm doing my best to avoid big regressions, I have a gut feeling that there will be some.

mdehollander commented 5 years ago

I did a quick test and it works fine for me. Thanks for making these improvements. What I do not understand directly is how to interpret the --threads and --num-pts flag. If I only specify --threads 16 is still uses 1 thread. Only when I increase the number of pt servers it runs in parallel.

epruesse commented 5 years ago

@mdehollander That should depend on the size of your reference database actually. Unless you had --search on. Did you?

Background:

SINA currently uses the PT server to do its kmer searches. Those are needed to select the reference sequences for the alignment, and to find candidates for the final homology search. The PT server is purely single threaded and occupies a significant amount of memory.

In the end, select the --num-pts parameter according to the memory you've got. Run SINA with it set to 1, run top and press g1 (or G1 in some versions) to view the memory occupied by arb_pt_server. Then decide how many you can run. Leave enough for SINA though, it likes memory, too.

I'm guessing that given enough memory, using --num-pts equivalent to 1/2 the number of cores should be enough except for really huge databases.

It's painful to invest so much time into this as I've already got --fs-engine internal mostly working, which will allow SINA to work without the PT server and have it's own, internal, multi-threaded search for which I won't have to allocate memory more than once.

epruesse commented 5 years ago

thanks @mdehollander