althonos / pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
https://pyrodigal.readthedocs.org
GNU General Public License v3.0
132 stars 5 forks source link

rust prodigal #8

Closed jianshu93 closed 2 years ago

jianshu93 commented 2 years ago

Hello Martin,

Thanks for the SIMD implementation. 2 questions, Neon is also supported right and is that possible to have rust binding to prodigal, but add multi-threading support, we all know the prodigal is so low.

Thanks,

Jianshu

althonos commented 2 years ago

Hi @jianshu93 ,

NEON is supported indeed, it should be automatically detected when building Pyrodigal on a platform that supports it. Regarding the Rust bindings, I want to say it's possible, but that's a whole new project in itself. Contrary to PyHMMER where the Cython code is mostly "thin" bindings on top of a C library, in here I had to manually reimplement a lot of parts in the code to make it possible to work (~50% I'd say). This means that in order to provide Rust bindings I'd also have to reimplement these in Rust, which would be a lot of work, and since I have no use right now for a Rusticized Prodigal this is not going to happen from me in the near future.

Regarding the multi-threading, Pyrodigal currently supports it indirectly, since the find_genes method is thread-safe, so you can share an OrfFinder instance between several threads, or use a multiprocessing.pool.ThreadPool as shown in the README.md, if you have several sequences.

The other thing that could be parallelized is the meta mode, where the same scoring operation is done several times with different training data, and I toyed around with that but didn't find any satisfying solution, so this is on hold. I figure the SIMD code is already making this step faster.

jianshu93 commented 2 years ago

Hi Martin,

Thanks for the quick response. I am thinking whether the FragGeneScan-rs (https://github.com/unipept/FragGeneScanRs) could be an option, to modify prodigal and use prodigal's model, of course, the code need to be restructured at least. FragGeneScan-rs is exactly the same with the original FragGeneScan, it is less accurate compare to Prodigal in terms of genome gene calling. Anyhow, many thanks for the pyprodigal, this is great, our team and I benefit from it.

Thanks,

Jianshu