SEA-PHAGES / starterator

Released Stable version of Starterator for SEA phages; Note does not work with current version of phamerator database! For version compatible with current phamerator database, see this repo: cdshaffer/starterator
0 stars 2 forks source link

partial genes annotated as full genes #30

Closed cdshaffer closed 6 years ago

cdshaffer commented 7 years ago

I have found that in database version 71 that pham 6635 is crashing starterator. it has the following genes all of which begin at base 1:

Members of Pham 6635

Gene            Phage   Cluster Length  Location
Grum1_Draft_1   Grum1     A     177 bp     1-177
Ollie_Draft_1   Ollie     A     177 bp     1-177
Sabia_Draft_1   Sabia     A     177 bp     1-177
Zetzy_Draft_1   Zetzy     A     177 bp     1-177

Closer examination shows that the first three bases of these genes is not a valid start codon but instead the first three bases are TGC. This leads to erratic behavior and crashing in starterator.

Tracking back to DNA Master, testing with phage Grum1 I can see that auto-annotation does indeed create a gene from 1 to 177. Tracking back to the NCBI genemark it looks like genemark is predicting partial genes if they are at the beginning of the sequence. There is a setting called circular/partial which is checked ON by default.

It also looks like phagesdb is translating it incorrectly and giving the frame3 translation (i.e. the translation of bases 3 - 176) and not the frame 1 translation (bases 1 - 174 or 177 if you include the stop codon).

Not sure what to do about this. Probably best to just add a sanity check that the start of the gene is a valid start codon and if not to just drop the pham maybe?

cdshaffer commented 6 years ago

all of these phage are out of draft an have correct start calls so no longer an issue