Closed thinkgenome closed 4 years ago
AFAIK hmmscan is able to use all cores by default:
--cpu <n>
Set the number of parallel worker threads to <n>. By default, HMMER sets this to
the number of CPU cores it detects in your machine
I also tried a quick test and adding --cpu 8
has no effect on performance.
Keep the issues coming, we are very happy to get feedback 👍
Thanks for prompt reply @prihoda So what changes in deepbgc pipeline commandline you suggest as with default commandline hmmscan only uses one core at a time for each sequence?
I am guessing this is due to limits in hmmscan
implementation, for me it also uses around 1-2 cores.
As per the hmmscan
docs, it looks like you can control the CPU limit using an env var:
You can also control this number by setting an environment variable, HMMER_NCPU.
So you can try setting that, but I think it unfortunately won't make a difference.
Can hmmscan be replaced with hmmsearch here? This is the compute bottleneck, as far as I can tell, and prevents scaling up my deepBGC usage.
This is old, but I think it still applies to hmmer 3.3. https://cryptogenomicon.org/2011/05/27/hmmscan-vs-hmmsearch-speed-the-numerology/
Can the cpu utilisation be improved w.r.t hmmscan step, to accelerate the analysis of metagenome data?