althonos / pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
https://pyrodigal.readthedocs.org
GNU General Public License v3.0
132 stars 5 forks source link
bioconda bioinformatics cython-wrapper gene-finding genome metagenomes orf-finder prodigal python python-interface python-library simd

πŸ”₯ Pyrodigal Stars

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source GitHub issues Docs Changelog Downloads Paper Citations

πŸ—ΊοΈ Overview

Pyrodigal is a Python module that provides bindings to Prodigal using Cython. It directly interacts with the Prodigal internals, which has the following advantages:

πŸ“‹ Features

The library now features everything from the original Prodigal CLI:

In addition, the new features are available:

🐏 Memory

Pyrodigal makes several changes compared to the original Prodigal binary regarding memory management:

🧢 Thread-safety

pyrodigal.GeneFinder instances are thread-safe. In addition, the find_genes method is re-entrant. This means you can train an GeneFinder instance once, and then use a pool to process sequences in parallel:

import multiprocessing.pool
import pyrodigal

gene_finder = pyrodigal.GeneFinder()
gene_finder.train(training_sequence)

with multiprocessing.pool.ThreadPool() as pool:
    predictions = pool.map(orf_finder.find_genes, sequences)

πŸ”§ Installing

Pyrodigal can be installed directly from PyPI, which hosts some pre-built wheels for the x86-64 architecture (Linux/MacOS/Windows) and the Aarch64 architecture (Linux/MacOS), as well as the code required to compile from source with Cython:

$ pip install pyrodigal

Otherwise, Pyrodigal is also available as a Bioconda package:

$ conda install -c bioconda pyrodigal

Check the install page of the documentation for other ways to install Pyrodigal on your machine.

πŸ’‘ Example

Let's load a sequence from a GenBank file, use an GeneFinder to find all the genes it contains, and print the proteins in two-line FASTA format.

πŸ”¬ Biopython

To use the GeneFinder in single mode (corresponding to prodigal -p single, the default operation mode of Prodigal), you must explicitly call the train method with the sequence you want to use for training before trying to find genes, or you will get a RuntimeError:

import Bio.SeqIO
import pyrodigal

record = Bio.SeqIO.read("sequence.gbk", "genbank")

orf_finder = pyrodigal.GeneFinder()
orf_finder.train(bytes(record.seq))
genes = orf_finder.find_genes(bytes(record.seq))

However, in meta mode (corresponding to prodigal -p meta), you can find genes directly:

import Bio.SeqIO
import pyrodigal

record = Bio.SeqIO.read("sequence.gbk", "genbank")

orf_finder = pyrodigal.GeneFinder(meta=True)
for i, pred in enumerate(orf_finder.find_genes(bytes(record.seq))):
    print(f">{record.id}_{i+1}")
    print(pred.translate())

On older versions of Biopython (before 1.79) you will need to use record.seq.encode() instead of bytes(record.seq).

πŸ§ͺ Scikit-bio

import skbio.io
import pyrodigal

seq = next(skbio.io.read("sequence.gbk", "genbank"))

orf_finder = pyrodigal.GeneFinder(meta=True)
for i, pred in enumerate(orf_finder.find_genes(seq.values.view('B'))):
    print(f">{record.id}_{i+1}")
    print(pred.translate())

We need to use the view method to get the sequence viewable by Cython as an array of unsigned char.

πŸ”– Citation

Pyrodigal is scientific software, with a published paper in the Journal of Open-Source Software. Please cite both Pyrodigal and Prodigal if you are using it in an academic work, for instance as:

Pyrodigal (Larralde, 2022), a Python library binding to Prodigal (Hyatt et al., 2010).

Detailed references are available on the Publications page of the online documentation.

πŸ’­ Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

πŸ—οΈ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

πŸ“‹ Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

βš–οΈ License

This library is provided under the GNU General Public License v3.0. The Prodigal code was written by Doug Hyatt and is distributed under the terms of the GPLv3 as well. See vendor/Prodigal/LICENSE for more information.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original Prodigal authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.