deprekate / PHANOTATE

PHANOTATE: a tool to annotate phage genomes.
GNU General Public License v3.0
69 stars 9 forks source link

fastpathz cannot traverse the network: TypeError: no path to target #39

Open jmeppley opened 9 months ago

jmeppley commented 9 months ago

Running phanotate on some phage gives me this error. The below is from Hubei odonate virus 11:

(phan.env) [jmeppley@tyrosine phanotate]$ phanotate.py -o test/NC_032956.ncbi.faa -f fasta test/NC_032956.ncbi.fasta 
/mnt/data0/jmeppley/projects/nanopore_biller/viemes_by_depth/phage_clusters/phanotate/phan.env/bin/phanotate.py:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').require('phanotate==1.6.3')
Traceback (most recent call last):
  File "/mnt/data0/jmeppley/projects/nanopore_biller/viemes_by_depth/phage_clusters/phanotate/phan.env/bin/phanotate.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/mnt/data0/jmeppley/projects/nanopore_biller/viemes_by_depth/phage_clusters/phanotate/PHANOTATE/phanotate.py", line 63, in <module>
    shortest_path = fz.get_path(source=source, target=target)
TypeError: No path to target

I have seen the error in phanotate version 1.5.1 (python 3.7 and 3.10) and the latest from github (1.6.3) on python 3.10.

deprekate commented 9 months ago

hm, looking at the genome, I am pretty sure it is caused by this giant 500bp of noncoding space between gp2 and gp3

I have the tolerance for noncoding gaps currently hard set at 300bp. Usually there are spurious false positive ORFs that allow the FASTPATH to jump such gaps, but it appears there is not a single possible ORF within that region.

Screenshot 2024-02-14 at 3 58 47 PM

To verify my suspicion, I fudge the genome a little to introduce a tiny ORF between the gp2 and gp3 genes by changing the internal stop codon in the pink/yellow region to an AGA and it allowed phanotate to jump the gap:

$ phanotate.py fudged.fasta 
#id:    NC_032956
#START  STOP    FRAME   CONTIG  SCORE
156 6662    +   NC_032956   -3.116770851669423876397960614E+48  
6769    8730    +   NC_032956   -3856076414883.260858535901140  
8885    8986    +   NC_032956   -2.694481252924878820091418773  
9152    9030    -   NC_032956   -0.1392390679612810770032495566 
9238    10590   +   NC_032956   -4207404799.216605188066453756  
10857   10693   -   NC_032956   -53.14899699753530574312845421  
10928   11035   +   NC_032956   -1.287255596423134590081672391  
11017   11205   +   NC_032956   -25.86378478556807501777759741  
11257   11349   +   NC_032956   -5.693152356918419510588396127  

I have the gap limit set to 500bp in the phanotate 2.1 version (which also allows it to run successfully on the genome without fudging it), but I haven't gotten around to pushing that version to the main branch here or pypi due to other errors that I was not able to debug since the genomes that crashed it were not publicly available to me. In the meantime I should update the 1.X version to have a large gap.

jmeppley commented 9 months ago

That did the trick for me. Thanks.