Open tmgreen opened 6 years ago
Hi Tom I have not thought about this in a long time, but it does seem to make sense that it would be 0 if we are adjusting the phase for the skipped base, but 1 if not. I am going to bring in some help - @barrymoore any thoughts?
Hi Tom @tmgreen ,
Yes, I think you’ve found an error in the spec - I would agree that if the ribosome slips, the new reading frame has to begin with a phase of 0. This example would have been around since the original first draft of the spec many years ago, and I think you may just be the first person to ever look at this closely enough to have caught that.
Thanks for the help in bringing this error to light.
@keilbeck, I’ll go ahead and make this change and if you want further discussion or changes we can talk in more detail later.
Barry
Hi @tmgreen and @keilbeck can you guys take a look at the diff on this edit and make any comments you feel are appropriate:
I uploaded a sketch as an example. Can't seem to get it to link inline here, but here is a link out to it:
I think the Lincoln's original example fails because the gene and mRNA should end at ZZZZ not YYYY and his pseudo numbers for the actual FS site on the CDS features work for a +1 FS, but the example doesn't specify that and seems to imply it's a -1 FS. Using the original example as a template, modifying it to use actual numbers (but not real world numbers yet) corresponding to my sketch linked above and only showing the frameshifting CDS feature:
-1 Frameshift
chrX . gene 1 300 . + . ID=gene01;name=my_gene
chrX . mRNA 1 300 . + . ID=tran01;Parent=gene01;Ontology_term=SO:1000069
chrX . CDS 1 249 . + 0 ID=CDS02;Parent=tran01
chrX . CDS 246 300 . + 0 ID=CDS02;Parent=tran01
+1 Frameshift
chrX . gene 1 302 . + . ID=gene01;name=my_gene
chrX . mRNA 1 302 . + . ID=tran01;Parent=gene01;Ontology_term=SO:1000069
chrX . CDS 1 249 . + 0 ID=CDS02;Parent=tran01
chrX . CDS 248 302 . + 0 ID=CDS02;Parent=tran01
Hmm, I think the numbers should be:
-1 Frameshift
chrX . gene 1 300 . + . ID=gene01;name=my_gene
chrX . mRNA 1 300 . + . ID=tran01;Parent=gene01;Ontology_term=SO:1000069
chrX . CDS 1 249 . + 0 ID=CDS02;Parent=tran01
chrX . CDS 249 300 . + 0 ID=CDS02;Parent=tran01
+1 Frameshift
chrX . gene 1 302 . + . ID=gene01;name=my_gene
chrX . mRNA 1 302 . + . ID=tran01;Parent=gene01;Ontology_term=SO:1000069
chrX . CDS 1 249 . + 0 ID=CDS02;Parent=tran01
chrX . CDS 251 302 . + 0 ID=CDS02;Parent=tran01
I think a -1
frameshift re-uses one base so it looks like
AGCTA
AGTACTT
(where only one A
is shared because the slippage is backwards one base)
A +1
shift would be like
AGCTA
G
TACTT
Where a single base gets skipped over entirely.
Hi, in the "Programmed frameshift" example (excerpt below), I'm confused about the phase of the second record:
I'm not an expert but shouldn't the phase of the second segment always be
0
? From my informal survey of a couple dozen examples ofribosomal slippage
found in NCBI's human and mouse GFF3 downloads, it is true that the second segment always has phase0
. Is it possible that NCBI's "ribosomal slippage" is just one subtype of programmed frameshift that is more strict than the general case?