Released Stable version of Starterator for SEA phages; Note does not work with current version of phamerator database! For version compatible with current phamerator database, see this repo: cdshaffer/starterator
I have found that in database version 71 that pham 6635 is crashing starterator. it has the following genes all of which begin at base 1:
Members of Pham 6635
Gene Phage Cluster Length Location
Grum1_Draft_1 Grum1 A 177 bp 1-177
Ollie_Draft_1 Ollie A 177 bp 1-177
Sabia_Draft_1 Sabia A 177 bp 1-177
Zetzy_Draft_1 Zetzy A 177 bp 1-177
Closer examination shows that the first three bases of these genes is not a valid start codon but instead the first three bases are TGC. This leads to erratic behavior and crashing in starterator.
Tracking back to DNA Master, testing with phage Grum1 I can see that auto-annotation does indeed create a gene from 1 to 177. Tracking back to the NCBI genemark it looks like genemark is predicting partial genes if they are at the beginning of the sequence. There is a setting called circular/partial which is checked ON by default.
It also looks like phagesdb is translating it incorrectly and giving the frame3 translation (i.e. the translation of bases 3 - 176) and not the frame 1 translation (bases 1 - 174 or 177 if you include the stop codon).
Not sure what to do about this. Probably best to just add a sanity check that the start of the gene is a valid start codon and if not to just drop the pham maybe?
I have found that in database version 71 that pham 6635 is crashing starterator. it has the following genes all of which begin at base 1:
Members of Pham 6635
Closer examination shows that the first three bases of these genes is not a valid start codon but instead the first three bases are TGC. This leads to erratic behavior and crashing in starterator.
Tracking back to DNA Master, testing with phage Grum1 I can see that auto-annotation does indeed create a gene from 1 to 177. Tracking back to the NCBI genemark it looks like genemark is predicting partial genes if they are at the beginning of the sequence. There is a setting called circular/partial which is checked ON by default.
It also looks like phagesdb is translating it incorrectly and giving the frame3 translation (i.e. the translation of bases 3 - 176) and not the frame 1 translation (bases 1 - 174 or 177 if you include the stop codon).
Not sure what to do about this. Probably best to just add a sanity check that the start of the gene is a valid start codon and if not to just drop the pham maybe?