ctSkennerton / minced

Mining CRISPRs in Environmental Datasets
GNU General Public License v3.0
99 stars 17 forks source link

Incorrect GFF location with sequence that is 100% CRISPR #4

Closed fangly closed 10 years ago

fangly commented 10 years ago

Miriam Shiffman has reported a problem when running Prokka. With Minced 0.1.5, I could trace back the problem to a contig that is covered at 100% by CRISPRs.

minced -gff troublesomeContig.fa results.gff

troublesomeContig.fa is:

707_L1_merged_contig_534811 GTCGCCCCTCACGCAGGGGCGTGAGTTGAAATGGTTCCTTAGCCATCACGCACCCACCTC CGCAACATGTCGCCCCTCACGCAGGGGCGTGGGTTGAAATTTAACTTGCGTTTCCAGCAT CACCGGTTTCTGCGCGTCGCCCCTCACGCAGGGGCGTGAGTTGAAATGGCCTGCGGGGAG GTGATGCCGCATGATCGTAAGCAGTCGCCCCTCACGCAGGGGCGTGAGTTGAAATTGCTC GCGAACATGCGCCGCCTGTAAATACTCCCGGTCGCCACTCACGCAGGGGCGTGAGTTGAA AT

results.gff is:

gff-version 3

707_L1_merged_contig_534811 minced:0.1.5 CRISPR 1 303 5 . . ID=CRISPR1

The contig is 302 bp long, but the reported end of the CRISPR region is 303 (i.e. beyond the end of the contig). This is what causes trouble in Prokka. I cannot investigate further and dig in the code at the moment. It will have to wait until next week (or until Connor fixes this :)

Florent

ctSkennerton commented 10 years ago

Fixed in version 0.1.6

tseemann commented 10 years ago

This resolves my bug too: https://github.com/Victorian-Bioinformatics-Consortium/prokka/issues/21 I've updated my version and requirements: https://github.com/Victorian-Bioinformatics-Consortium/prokka/commit/e4dfd705e7c19b052e507fb9ae59eec3022e3ecb