althonos / pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
https://pyrodigal.readthedocs.org
GNU General Public License v3.0
138 stars 5 forks source link

Is it possible to exclude amino acids predicted as unknown - 'X' #10

Closed linda5mith closed 2 years ago

linda5mith commented 2 years ago

I'm using PhageBoost which depends on pyrodigal to predict the protein sequence but it breaks if there is an unknown AA - 'X'. Is it possible to force prodigal to predict an AA or just exclude the unknown amino acids?

althonos commented 2 years ago

Hi Linda!

There is an option in the Pyrodigal results to change the letter used for unknown amino acids: it uses X by default, but you could change it to any other letter (maybe G?). You can check pyrodigal.Prediction.translate.

althonos commented 2 years ago

Other option would be to enable region masking (pyrodigal.OrfFinder(mask=True)), so that no genes can be predicted across unknown nucleotide.

linda5mith commented 2 years ago

Lovely, thanks for your swift reply!

althonos commented 2 years ago

There was a bug in previous versions that caused the translate method to always ignore the replacement character, but it was fixed in v0.7.1. Hopefully this solves your question!