Closed epruesse closed 5 years ago
I did a quick test and it works fine for me. Thanks for making these improvements.
What I do not understand directly is how to interpret the --threads
and --num-pts
flag. If I only specify --threads 16
is still uses 1 thread. Only when I increase the number of pt servers it runs in parallel.
@mdehollander That should depend on the size of your reference database actually. Unless you had --search
on. Did you?
The --threads
default is dynamic, allowing TBB to profile the CPU load and decide based on what else is running on the computer. Intel's engineers say to only use hard thread configuration for scalability testing. I tend to trust them on that, and it's been working nicely for me so far.
The --num-pts
flag configures how many PT server instances will be launched. If you set this too high, you run out of memory. If you set it too low, SINA won't saturate all cores.
SINA currently uses the PT server to do its kmer searches. Those are needed to select the reference sequences for the alignment, and to find candidates for the final homology search. The PT server is purely single threaded and occupies a significant amount of memory.
In the end, select the --num-pts
parameter according to the memory you've got. Run SINA with it set to 1, run top
and press g1
(or G1
in some versions) to view the memory occupied by arb_pt_server
. Then decide how many you can run. Leave enough for SINA though, it likes memory, too.
I'm guessing that given enough memory, using --num-pts
equivalent to 1/2 the number of cores should be enough except for really huge databases.
--search
, you'll be limited to just one PT server there. That ticket is still open. I'll see what I can do.It's painful to invest so much time into this as I've already got --fs-engine internal
mostly working, which will allow SINA to work without the PT server and have it's own, internal, multi-threaded search for which I won't have to allocate memory more than once.
thanks @mdehollander
Could any of you give the pre-release a spin before I push out the 1.4.0?
@mdehollander @rec3141 @Lagkouvardos @larusnz @a1an77 @ZarulSaurus @v-kisand @marcomeola
Adding multi-cpu support was a bigger change, so while I'm doing my best to avoid big regressions, I have a gut feeling that there will be some.