Closed dcampbdc closed 1 month ago
I think the issue arises at the level of nucleotide sequence. I tried "Phanotate -f fna" and then translated with transeq. The special characters are all stop codons now...
10F_NODE_1_length_1907_cov_3.266739CDS[complement(66..398)]_1 [note=score:-1.931059E-01] HPCFLILSRKTSPNQKTWVKFL*KRNS*NCIGVFVVSVCPHTTYTHTGVEY*GSPFS*ND DP*KYGKPRHFPPVSVLLLVITLRE*RQAVKKPFDVGVFKQREKDDDD*DI 10F_NODE_1_length_1907_cov_3.266739CDS[complement(476..565)]_1 [note=score:-5.421236E-02] NCF*NHAW**SNRVPPKQIVRQCRRLKVKT 10F_NODE_1_length_1907_cov_3.266739CDS[579..944]_1 [note=score:-9.828631E+01] MLYTEKEKHEIERVKEVFAEHLRQSPDFELLWSDKVGYVWLTIGVNPVYVDTGIRIESAA DLCGRCLDDVATDVLYTTGNDHALEVADPLELAEIKRRWEPYINQLPDYAYLCKDLLNGK M* 10F_NODE_1_length_1907_cov_3.266739CDS[1004..1846]_1 [note=score:-3.306728E+04] MKKSLTFRLWQDRKSILISCGARLAPFDIQELRDLTMYDELQLDTLGDKKTALFLIMSDT DSTFNFLISMVYTQLFNLLCDKADDQYGGKLPVHVRCLIDECANIGQIPNLEKLVATIRS REISACLVLQARSQLKAIYKDNADTIVGNMDSQIFLGGSEPTTLKDLSEILGKETIDAFN TSDTRGNSPSYGTTFQKMGHELLSRDELAVLDAGKCILQLRGVRPFLSDKYDLTQHPNYK LTSDYDPKNTFDIEKYLNRKEKIYPDDEFIVVDADSLPPA*
yeah, it is probably an off by 1 error that somehow re-emerged. The #,+,* symbols are each of the respective stop codons (differentiating them is useful in some cases). I'll try to get it fixed by tomorrow
Any update on this? Thank you!
I was able to replicate it. Working on a fix now
whew, so I finally tracked it down and it actually came from my genbank dependency, and that I forgot to push the newest version (0.118) of genbank to pypi. Doh
I'm trying to run Phanotate as follows:
$ phanotate.py -f faa -o 10F-1.faa 10F-1.fna
But this output I get has symbols (#,+,*) inside the sequence, or ends with a symbol that is not a stop codon (*):
I saw this was an issue in an older version, has it re-emerged?