althonos / pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
https://pyrodigal.readthedocs.org
GNU General Public License v3.0
138 stars 5 forks source link

[Feature request] Support for stdin #35

Closed jolespin closed 2 months ago

jolespin commented 1 year ago

I'm trying to replace prodigal with pyrodigal in my VEBA binning-prokaryotic.py module. I use stdin because I do some filtering on contig lengths w/ seqkit before I do gene calls and don't want to rewrite the contigs b/c that's a lot of space.

Can you add support for reading fasta input as stdin?

Preferably it no input is provided then assume it's stdin which is the current functionality of prodigal.

Also, I just noticed there's not support for stdout. Can you add this functionality as well? This is the default functionality of prodigal and it's quite useful. I use it to pipe directly into this script: https://github.com/jolespin/veba/blob/devel/src/scripts/append_geneid_to_prodigal_gff.py which adds the gene id.

althonos commented 1 year ago

Hi @jolespin,

I'm always a bit reluctant to add more features to the CLI as the primary goal of Pyrodigal is to be used as a library when possible, and because I don't want to mimic the Prodigal CLI 100% (like the broken GenBank output for instance). However, I had a look at your code architecture, and while I believe it would be best not to invoke Pyrodigal as a subprocess, I understand why it's organized like that. I'd accept a PR if you want to try, otherwise I'll have a look but can't promise when.

jolespin commented 1 year ago

Can you point me in the direction of the executable I should edit? I thought it was the _cli.py but noticed there are some arguments that are not usable in the cli such as the number of threads.

althonos commented 1 year ago

No, you're right, it's indeed the _cli.py file that should be changed. The --jobs argument was added recently and was not available in the Pyrodigal release on PyPI, but you could use it locally. Check the CONTRIBUTING.md guide if you need help setting up a local copy of the repository for testing :smiley:

jolespin commented 1 year ago

Have you got any other requests about stdin support?

althonos commented 1 year ago

Not really, just make the stdin and stdout explicit in CLI, e.g. pyrodigal -i - to read from stdin, rather that assuming stdin when the -i flag is missing. I think it's better to keep the -i flag required.

althonos commented 2 months ago

I've added support for reading from stdin like in the original binary, this will be available in the next release (v3.5).