deprekate / PHANOTATE

PHANOTATE: a tool to annotate phage genomes.
GNU General Public License v3.0
69 stars 8 forks source link

"parallel edges are forbidden" error in v1.6.4 but not in v1.5.1 #40

Closed mtisza1 closed 8 months ago

mtisza1 commented 8 months ago

Hi,

On certain sequences (perhaps because they have large gaps between ORFs), phanotate v1.6.4 throws an error. This is not observed with phanotate v1.5.1, which returns ORF predictions.

(phanotate_164) mjt % phanotate.py --version
1.6.4
(phanotate_164) mjt % phanotate.py -f tabular uParvo3481.fasta -o uParvo3481.phan_v1.6.4.tsv
Traceback (most recent call last):
  File "/Users/michaeltisza/miniconda3/envs/phanotate_164/bin/phanotate.py", line 49, in <module>
    graph = functions.get_graph(orfs)
  File "/Users/michaeltisza/miniconda3/envs/phanotate_164/lib/python3.10/site-packages/phanotate_modules/functions.py", line 429, in get_graph
    G.add_edge(Edge(left_node, right_node, score )) 
  File "/Users/michaeltisza/miniconda3/envs/phanotate_164/lib/python3.10/site-packages/phanotate_modules/graphs.py", line 74, in add_edge
    raise ValueError("parallel edges are forbidden")
ValueError: parallel edges are forbidden

The sequence below was used as the input. Notably, this contig is not predicted to belong to a phage genome, and therefore this may be the intended behavior of phanotate.

Best,

Mike

uParvo_sus_pigfeces_SRR11413591_3481 CTGATGAAGATAATAATTTATCAGATCTTGTAAAAATATCTCCAAAATATTTCAAGATTTCTACTGGTGGTGAAAC TACTATTAATCCTGCATTTAGTACTGACAGTACAAATGCTGATCCTGATTATCGTTTATGGAGTATGACAAAACAA GGACTTGATCAAAATAACTCTCCTTGTTTAAATTATAACATTTGGAAGAATGCTAGTAGATTTTATATATTCAGTT TTGCTGAGAATTTCAGCTTACTTGATAATAATAACTATATTAATTATGAATTAAGATTTTCTGACGACGATACTGT TGATATTCCTGAAAAGGTCAATATTCATAGAATTTATTTGAAGGATTATCTAACTATTTTTGAAAGTCAAACTGAA TAATTGAGAAGAATTTATTCGTTGAAACAACTTTTTTACTTTGTTTCTAACAGTGACGTAAGTGACGATAGTGACG ATACTTTATAAACTTTTAAAATAGAACAGATTTTCAGTTTTTCTTAAAATCTTAAAAATAAAAATCTTTTATTAAA TTATCGTCACTATTGTCACTGTAAATTATAAAATCAAGCAAATTAAACAAAATATAGTAAAATCATCTGTATGACA TTCCCTTAAACGTTAAAAAGTAGGGAGTGACGTTAAATATAAATTATCGTCACTAGTGTCGATAATTAACGTCACT GTTAAAGATTTCTAGAAAAATTTAGACTAACTGTTAAAATTATTATCGTCACTGTTTAAATTATTATCGTCACTGT TAAATATTAAGTGACGTTAAACAAAGTGAAACGCTGAATTGCTTATTCAAAGTAAAAGAATTAATTTAAGATTTTT AATCTTCTTCAAATTCACAAGCATCAATACGATCTTCTGGTAAAATATTATATTTTTCCTCAATAAGATCTCTTAC ATAGTGTAAATCAATTAAACGAATACCATTAGAGCGTCTATATACATTATTTCCAAATAATTTGGTCATTTCTTTA CTAATATTACGTATACTAAGATCTTTAATATCCTCATCAGCACAAATTCTAATTTGATTTACAAGTTCAGATGGAG TATATGTATCAAATAACTTATTATAATATATAACTTCCACAATCCATGGCATATTACATCTTTGCATAGTATCTTT TGAAGGAGTATATATATTTCTATGGAAATCATATGAAGAATCATACCATTCACAAGCTTTATGATAACAACTTGAT AAGAAATCAGGATTATTTTCCCATTTTGTTCTAAATTCTTTTGCAAATGATTTTTTAACAGGAACCCCTTGTAAAT TATAAAATAATCCACGACGTTCACCAGTTTCCCATTCAAGAGGTATACCTTGAGTAGTATTACTAAATAAGAATAA TGTTGATTTATTATCTTTTGTTTCAATATGACCATATTTATGATTTACTGTAACTTCAGATGCTGTAATATAATCT TTAATCTTATTAACAGTATCAGTGGATCTATCATGACATTCATTAATCATAACAACACATTTACCTGCATTTACAT TAAATTGTCCAACAACTTTATCTAGTGTTGATTTTGCAGTTTCAATAGGATCATACCACTTACTTACTATATCAAA AAGAGTATCTTTACCATATCCTTGATTTGATGCTATACAAACAGCAATTTCAGTACGATATCCAGGTTTCATAGCA AGTGATCCTAATAGTTTCTTAAGAGGTTCAATATATTGTAATTGTAATTCGACACAATTTGATTCACCATATGCAA AAGTGTTAACAAATTTTTCAAAATCATCACCAATTTCTTTATTGTAATGTCCATTAATAATTTCAGGTTTTATGCC CATAAACATACTGTAATATAAATCCCCATTTTGTGTTTTTATTAACCATTTACGAGGATTATCAATATCATCTTTT CTATATGCAGTTTCAATATAAGAGTATAAGTGTTCACATCCGTTTTGAGCTAAAATTGCTTTAAAATCACTAGGTT TATACATTGTCATATTATTAATAGTATATCTGCACATTACAGTATTTGTCATAATATCAATAACAACATACTGTCT AATATAATTAATTTGTCTGTCAATACGATCTCCTTGTTTTAATGACTTTAATTTTCCAAAACTGAATGGTTCATTT GGATATTCAATACATTCTGGTTTTGCTGATAAATATTCAGCATATGTGTTATATTTACATGGCTGATATTCCATAG GATCAAAATCATTTTGTTCGAAATCTTTTGATATAAATTTGATATTATCAAGAGGAATATAGAAATCAATATTATG TATATATTCATTAATAGCATCTGTTAATTGATTAGGTGTTAATTTAATTTTTTCTAGTTGATCCTTTCTTATCATG CATCCATCAAAACAATATATGCATGTACTAATATCAATACCAAGATCACTTAATTTCTTATAAACTTGACGAATTA TATTAACTTCTAAGAATTGCATATATCTACTAATACATGAATTATCAAGTTCATATTGTGATAATTTTCCTGTTGA TTTACTGTCTGCAATAAGAATATTCTTTAATTTACAACAATGTTGTTTACCATCAGGATTTTGATACTTATTATTT TTAATAGTACAGTTTGTTTTAATAGTAATTTTATTAGCATGATCAATAATAATTTTTCTTGAATTTTGCATTTCAT TATAATAATTAATAACAAATTCTGTTGGTTGTACATTTTTAGCATTATTATTATACCAAGTTCTATAATTACCACC AAATCCAATCATTAAAAATAATCTTTTTGCTAAATCACGTGAAACTTTACATGATTCCATTATTTCTTTAAGATAT TTATCTCTATTATTTATGTATTCACCCAAATACTTACCATTTGTTATTGAATACATAAAATTAGGATATGCATTAA CAATATCAATATCGATATAATTATCTTTATATAATGCAGCTCTTATTTCTCTCATAAAATGACAGGCACCACAAGC ATTACAATTATTTACAGTTGCATATTTTCTATATAATCCGAATTTGTCACCACCACCTGATGAATATAGAATATCT ATTGATTTATCTGCTGATAAAACATTTTTATTAATAGAATTTTCTAAATAACATAAATAAACTGATGGATTTACAC CTTGACTGATTTCAACATTATTTCTTAATATTTTTTCTTTTTCTATTAAAAAAGAAATACATCTTCTTAAAGTGTC TTCATTGTATACTTCATAGTAAGGTTGGCATAAAATATCAGATTCCATTTTACTTTGTATTTATTTAATAAATAAA AAAAATTTTTAAAATAGAAAATAAAAATTAGTTAGAATTAATTTAGATAAAATTTATGATTACTAAGAAAAAACTT GCAGTGACGATAATAGTTTAAACAGTTAGTCTAAATTTTTCTAGAAATCTTTAACAGTGCCGATATTTATCGTCAC TAGTGACGTTAAACACCAAATTATCGTCACTGTCAATTACTATCAATTTGTTACTATTTAA

deprekate commented 8 months ago

Ah, a week ago or so ago I increased the distance threshold during the ORF connection step from 300bp to 500bp to fix a situation where gaps in the ORFs of the genome broke the path through the genome, but I forgot to also increase the distance in the code that should have found those gaps above the threshold and connected ORFs across the gap (not sure why it didn't to catch the gaps, probably an off-by-one error on my part)

Just fixed both distance thresholds to match and pushed the new version to pypi and here : )

$ phanotate.py -V
1.6.5

$ phanotate.py input.fasta | head
#id:    uParvo_sus_pigfeces_SRR11413591_3481
#START  STOP    FRAME   CONTIG  SCORE
3   383 +   uParvo_sus_pigfeces_SRR11413591_3481    -5.187465E+01
3240    835 -   uParvo_sus_pigfeces_SRR11413591_3481    -2.416964E+27
mtisza1 commented 8 months ago

Wow thanks for the quick fix! I'll close this.