althonos / pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
https://pyrodigal.readthedocs.org
GNU General Public License v3.0
141 stars 5 forks source link

Alternative translation tables #34

Closed Matt-Schmitz closed 1 year ago

Matt-Schmitz commented 1 year ago

With prodigal in single mode without prior training, it is possible to specify different codon tables. Here with table 4: prodigal -i input.fa -o output.out -p single -g 4

With pyrodigal, I see that you can set translation_table in training mode, but is it also possible to set a table without training?

import Bio.SeqIO
import pyrodigal
import glob

for file in glob.glob("*.fa"):
    print(file)
    record = Bio.SeqIO.read(file, "fasta")

    orf_finder = pyrodigal.OrfFinder()
    genes = orf_finder.find_genes(bytes(record.seq))
    with open(f"pyrodigal-{file[:-3]}.out", "w") as dst:
        genes.write_gff(dst, sequence_id="pyrodigal")

Is there some way to set a translation_table argument in orf_finder.find.genes?

althonos commented 1 year ago

Hi Matt!

Actually Prodigal is always training when you're not setting it to metagenomic mode (-p meta), even in the example you're showing, it just does it under the hood and since you're not passing a training file name it doesn't save the result anywhere.

Matt-Schmitz commented 1 year ago

Thanks for the quick reply Martin! Where would I include the translation_table in my example?

althonos commented 1 year ago

I'm almost sure that if you tried to run your example as-is you'd get a RuntimeError, as you didn't train the OrfFinder. Just call the train method with a different translation table:

import Bio.SeqIO
import pyrodigal
import glob

for file in glob.glob("*.fa"):
    print(file)
    record = Bio.SeqIO.read(file, "fasta")

    orf_finder = pyrodigal.OrfFinder()
    orf_finder.train(bytes(record.seq), translation_table=4)
    genes = orf_finder.find_genes(bytes(record.seq))
    with open(f"pyrodigal-{file[:-3]}.out", "w") as dst:
        genes.write_gff(dst, sequence_id="pyrodigal")
Matt-Schmitz commented 1 year ago

Thanks Martin! It works now. I had previously run pyrodigal in meta mode and had deleted the meta argument in pyrodigal.OrfFinder to switch to single mode. I didn't know that training was required since prodigal allows single mode runs without specifying a training file. I guess it just takes the input file as the training file automatically in the background.

althonos commented 1 year ago

Exactly! Glad it works :)