Closed jpjarnoux closed 2 years ago
Hi @jpjarnoux
pyhmmer
releases the GIL where applicable, so you don't have to use processes to get it to work, threads will work efficiently as well. Try using multiprocessing.pool.ThreadPool
instead of multiprocessing.pool.Pool
, this should already give you some decent performance (or use pyhmmer.hmmsearch
which does it for you). Otherwise, I'll try adding pickle
support to TopHits
when I have some time.
Okay thanks it's what I was reading. However if I have 16 cpu available it's look like they are not fully used. Maybe it's possible to say it to GLI ? I will try your advice tomorrow and keep you in touch. Thanks
Then it really depends what you are trying to achieve, I cannot really guess without seeing your usecase, perhaps you don't have enough target sequences to make complete use of all your CPUs.
In my benchmark, I also noticed that HMMER was having a hard time using more than the number of physical CPUs because it's using too many SIMD registers to benefit from hyperthreading. It could be that you're on a machine with 8 physical / 16 logical cores; in that case, you'll see no improvement using 16 jobs instead of 8.
Sorry, I should explain more clearly what I'm doing. I'm trying to annotate proteins with 4000 thousand HMM. I have one file by HMM. Before I created one DB with all my HMM. Now, to be more efficient, I'm trying to split with multiple DB and to concatenate results. I keep you in touch, thank you.
Hi I finally used the concurrent.futures.ThreadPoolExecutor and everything work very efficiently. Thanks for your help.
Happy to hear this!
Hi, I would work with multiple CPU, but I don't understand how to give more than one CPU to pyhmmer. So I tried to use multiprocessing packages, but pyhmmer object are
non-trivial __cinit__
. Example :multiprocessing.pool.MaybeEncodingError: Error sending result: '<pyhmmer.plan7.TopHits object at 0x561959114ad0>'. Reason: 'TypeError('no default __reduce__ due to non-trivial __cinit__')'
Could you give me an example to use pyhmmer with more than one CPU if it's possible ? Thanks