SEA-PHAGES / starterator

Released Stable version of Starterator for SEA phages; Note does not work with current version of phamerator database! For version compatible with current phamerator database, see this repo: cdshaffer/starterator
0 stars 2 forks source link

pham 24190 crashes starterator #20

Closed cdshaffer closed 6 years ago

cdshaffer commented 8 years ago

not sure why, only thing unusual about the pham is that it has two adjacent genes from phage patio:

members are: GRU1_GoPhGRU1p27 GTE5_GoPhGTE5p28 Patio_Draft_42 Patio_Draft_43

error message is:

exception in starterator
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/home/seastudent/PycharmProjects/Starterator/starterator/uiStarterate.py", line 296, in run
    self.starterate() #running thread starts here
  File "/home/seastudent/PycharmProjects/Starterator/starterator/uiStarterate.py", line 301, in starterate
    gui=self, event=self.stop_thread)
  File "/home/seastudent/PycharmProjects/Starterator/starterator/starterate.py", line 131, in starterate
    final_file,s = pham.final_report()
  File "/home/seastudent/PycharmProjects/Starterator/starterator/report.py", line 360, in final_report
    self.make_report()
  File "/home/seastudent/PycharmProjects/Starterator/starterator/report.py", line 366, in make_report
    self.pham.find_most_common_start()
  File "/home/seastudent/PycharmProjects/Starterator/starterator/phams.py", line 220, in find_most_common_start
    most_called_start_index = self.total_possible_starts.index(called_starts_count[0][0])+1
ValueError: 207 is not in list
cdshaffer commented 7 years ago

still crashing but as of database version 95 is pham number 24190

cdshaffer commented 7 years ago

investigated and there is a disconnect in gene GRU1_27 and GTES_28. the "alignment_start_site" set to 307 but when that number is used to reference the "total_possible_starts" (pham.py line 234) 307 is not on the list. So either the 307 is wrong or the total possible starts list is not constructed properly.

cdshaffer commented 7 years ago

turns out these two genes are tail assembly chaperones with ribosomal slippage. In the genbank file the authors only annotate the long form of the gene not both the short and long forms. Fur GRU1_27 the location in the genbank file is join(15854..16153,16153..16665). In the phamerator database the start is 16151 (zero based) and stop is 16664 (half open).

So it looks like we have a mess up with coordinates frames (using frames as defined in DNA Master): The upstream part from 15854 is frame 2, the downstream part from 16153 is frame 1, but the phamerator 16151 would be frame 3. Thus the disconnect between the 307 found in total_possible_starts and the alignment_start_site.

cdshaffer commented 6 years ago

this pham no longer crashes. The database has been updated