althonos / pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
https://pyrodigal.readthedocs.org
GNU General Public License v3.0
138 stars 5 forks source link

Mask sequences with trailing `N` #55

Closed ohickl closed 4 months ago

ohickl commented 4 months ago

Hi, thanks for the great tool!

I noticed that pyrodigal would still output translations of sequences with trailing N, despite setting mask=True. Is this intended? Setup:

...
gene_finder = pyrodigal.GeneFinder(meta=True, mask=True, min_mask=0)
...
for id, seq in zip(headers, sequences):
    genes = gene_finder.find_genes(seq)
    predictions.append((id, genes))
...
with gzip.open(output_file, "wt") as f:
    for contig_id, genes in predictions:
        genes.write_translations(f, sequence_id=contig_id, include_stop=False)
...

Tested with e.g. https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_902164675.1.

Or am I maybe running it wrong?

Best

Oskar

althonos commented 4 months ago

Hi Oskar, this was totally a bug inside the masking code, I will push a patch.

althonos commented 4 months ago

Fixed in v3.4.0 👍

ohickl commented 4 months ago

Great, thanks!