dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
150 stars 38 forks source link

Please add a command line option to specify the number of threads GLnexus is using #133

Closed samfux84 closed 6 years ago

samfux84 commented 6 years ago

If you consider that some users of your software are running it in a shared environment, like a HPC cluster, then using std::thread::hardware_concurrency() to set the number of threads used might not be the best idea.

I just observed one case, where a user on our HPC cluster submitted 3 single core GLnexus jobs that were all dispatched to the same large compute node.

[sfux@lo-a6-001-11 threads]$ cat test.cpp 
#include <iostream>
#include <thread>

int main() {
    unsigned int n = std::thread::hardware_concurrency();
    std::cout << n << " concurrent threads are supported.\n";
}
[sfux@lo-a6-001-11 threads]$ ./test 
72 concurrent threads are supported.

These jobs can only use the 3 cores that are reserved by the batch system (the other 33 cores are reserved for other jobs). The 3 jobs started a total of more than 800 threads (even though only 3*72=216 can be attributed to the threads setting, but there might be other things going wrong), all fighting for the 3 available cores (we limit the cores that a job can use with cgroups, i.e. all the threads started by these processes will be bound to the cores requested from the batch system). This slows down all 3 jobs and makes them very inefficient.

Therefore I would like to kindly ask you to implement an option that users can specify the number of threads on the command line, when running the glnexus_cli command.

mlin commented 6 years ago

Dear @samfux84 thank you for writing this. Indeed this wasn't a high priority as we tend to deploy GLnexus intending for it to occupy entire cloud VMs, but I've added command-line options for memory and thread budgets to glnexus_cli, on the master branch and upcoming tagged release. Please reopen this if you have a chance to try it and it doesn't work as expected.

Regarding the higher than expected thread count, I suspect this is from an internal background thread pool that RocksDB allocates but in practice most of them are idle.

Hope it works well for your user otherwise! If you run into other questionable performance issues there are a number of tips here: https://github.com/dnanexus-rnd/GLnexus/wiki/Performance