ebi-pf-team / interproscan

Genome-scale protein function classification
Apache License 2.0
303 stars 67 forks source link

hmmscan step only used 1 cpu #324

Closed zhan4429 closed 1 year ago

zhan4429 commented 1 year ago

I built the latest version of InterProScan into a singularity container, and ran it on our HPC clusters with local mode. The whole node has 128 cores. Below is the my command:

singularity exec interproscan_5.61.93.0.sif interproscan.sh --disable-precalc -cpu 126 -t n -i nt.fasta -d output

I noticed that the log stayed in 95% completed for a long time: 29/04/2023 08:40:42:948 90% completed 29/04/2023 09:40:42:949 95% completed 29/04/2023 10:40:43:054 95% completed 29/04/2023 11:40:43:096 95% completed 29/04/2023 12:40:43:170 95% completed 29/04/2023 13:40:43:265 95% completed 29/04/2023 14:40:43:291 95% completed 29/04/2023 15:40:43:325 95% completed 29/04/2023 16:40:43:347 95% completed 29/04/2023 17:40:43:395 95% completed 29/04/2023 18:40:43:419 95% completed 29/04/2023 19:40:43:530 95% completed 29/04/2023 20:40:43:545 95% completed

When I checked the cpu usage, I found even if I used -cpu 126, only 1 node was running hmmscan. The documentation did not mention how to increase the threads can be used in hmmscan. Can anyone give me some advice?

Best, Yucheng

matthiasblum commented 1 year ago

The -cpu option set the number of workers, i.e. the max number of parallel steps/tasks that can run concurrently. The number of CPUs used by hmmsearch or hmmscan steps can be changed by editing the interproscan.properties file.

For instance, in order to make hmmsearch/hmmscan use 8 CPUs when searching sequences against Pfam, you would add the following line:

hmmer3.hmmsearch.cpu.switch.pfama=8

And if you want to use 8 CPU for all HMMER-analyses, you would also add:

hmmer3.hmmsearch.cpu.switch.antifam=--cpu 1
hmmer3.hmmsearch.cpu.switch.gene3d=--cpu 1
hmmer3.hmmsearch.cpu.switch.funfam=--cpu 1
hmmer3.hmmsearch.cpu.switch.panther=--cpu 1
hmmer3.hmmsearch.cpu.switch.pirsf=--cpu 1
hmmer3.hmmsearch.cpu.switch.pirsr=--cpu 1
hmmer3.hmmsearch.cpu.switch.sfld=--cpu 1
hmmer2.hmmpfam.cpu.switch.smart=--cpu 1
hmmer3.hmmsearch.cpu.switch.superfamily=--cpu 1
hmmer3.hmmsearch.cpu.switch.tigrfam=--cpu 1
zhan4429 commented 1 year ago

The -cpu option set the number of workers, i.e. the max number of parallel steps/tasks that can run concurrently. The number of CPUs used by hmmsearch or hmmscan steps can be changed by editing the interproscan.properties file.

For instance, in order to make hmmsearch/hmmscan use 8 CPUs when searching sequences against Pfam, you would add the following line:

hmmer3.hmmsearch.cpu.switch.pfama=8

And if you want to use 8 CPU for all HMMER-analyses, you would also add:

hmmer3.hmmsearch.cpu.switch.antifam=--cpu 1
hmmer3.hmmsearch.cpu.switch.gene3d=--cpu 1
hmmer3.hmmsearch.cpu.switch.funfam=--cpu 1
hmmer3.hmmsearch.cpu.switch.panther=--cpu 1
hmmer3.hmmsearch.cpu.switch.pirsf=--cpu 1
hmmer3.hmmsearch.cpu.switch.pirsr=--cpu 1
hmmer3.hmmsearch.cpu.switch.sfld=--cpu 1
hmmer2.hmmpfam.cpu.switch.smart=--cpu 1
hmmer3.hmmsearch.cpu.switch.superfamily=--cpu 1
hmmer3.hmmsearch.cpu.switch.tigrfam=--cpu 1

Thanks for the instruction. My interproscan runs much faster now!