althonos / pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
https://pyrodigal.readthedocs.org
GNU General Public License v3.0
132 stars 5 forks source link

Pyrodigal in parallel #30

Closed rhysnewell closed 1 year ago

rhysnewell commented 1 year ago

Hi Devs,

Cool project, thanks for carrying the torch for prodigal. Something that I've noticed with pyrodigal is that it doesn't seem to cope too well when you try to run multiple commands in parallel at once through different python processes. The original prodigal doesn't have this issue, you can run multiple prodigal commands via parallel or parallel calls in another programming language and it will run each command independently. Doing so vastly improves the overall time. Conversely, pyrodigal performs identically whether run in serial or parallel. Are separate calls to pyrodigal competing for resources? I wouldn't think the Python GIL would be causing this as these are separate processes being called.

If you have any info on how to improve parallel calls then please let me know, it would be very helpful.

Cheers, Rhys

rhysnewell commented 1 year ago

Just to clarify, I'm talking about using pyrodigal from within a python script so I can access the classes used by pyrodigal. I'm making calls from Rust using Pyo3 to call to pyrodigal and return all the stuff that I want.

The pyrodigal command line interface can be passed through parallel very easily, and seems way faster than the python library?

rhysnewell commented 1 year ago

Nevermind, I believe this issue is the result of pyo3 running up against the GIL. Closing.

althonos commented 1 year ago

Well, Pyrodigal is supposed to release the GIL when running in parallel, and this works well when using a Python ThreadPool, but I have never checked how this works with PyO3...

rhysnewell commented 1 year ago

I think it doesn't work with pyO3 because runs python as a subprocess of the main rust process and all python calls go to the same subprocess which is unfortunate. They appear to be looking into a potential fix for this with python3.11 but it will be a while off. I could be wrong, but this seems to be the cause of the issue.