Closed cdshaffer closed 6 years ago
still crashing but as of database version 95 is pham number 24190
investigated and there is a disconnect in gene GRU1_27 and GTES_28. the "alignment_start_site" set to 307 but when that number is used to reference the "total_possible_starts" (pham.py line 234) 307 is not on the list. So either the 307 is wrong or the total possible starts list is not constructed properly.
turns out these two genes are tail assembly chaperones with ribosomal slippage. In the genbank file the authors only annotate the long form of the gene not both the short and long forms. Fur GRU1_27 the location in the genbank file is join(15854..16153,16153..16665). In the phamerator database the start is 16151 (zero based) and stop is 16664 (half open).
So it looks like we have a mess up with coordinates frames (using frames as defined in DNA Master): The upstream part from 15854 is frame 2, the downstream part from 16153 is frame 1, but the phamerator 16151 would be frame 3. Thus the disconnect between the 307 found in total_possible_starts and the alignment_start_site.
this pham no longer crashes. The database has been updated
not sure why, only thing unusual about the pham is that it has two adjacent genes from phage patio:
members are: GRU1_GoPhGRU1p27 GTE5_GoPhGTE5p28 Patio_Draft_42 Patio_Draft_43
error message is: