The-Sequence-Ontology / Specifications

GFF and GVF specification documents
208 stars 91 forks source link

Clarification of GFF3 "Programmed frameshift" example #13

Open tmgreen opened 6 years ago

tmgreen commented 6 years ago

Hi, in the "Programmed frameshift" example (excerpt below), I'm confused about the phase of the second record:

chrX  . CDS                XXXX   YYYY .  +  0 ID=cds01;Parent=tran01
chrX  . CDS                YYYY-1 ZZZZ .  +  1 ID=cds01;Parent=tran01

I'm not an expert but shouldn't the phase of the second segment always be 0? From my informal survey of a couple dozen examples of ribosomal slippage found in NCBI's human and mouse GFF3 downloads, it is true that the second segment always has phase 0. Is it possible that NCBI's "ribosomal slippage" is just one subtype of programmed frameshift that is more strict than the general case?

keilbeck commented 6 years ago

Hi Tom I have not thought about this in a long time, but it does seem to make sense that it would be 0 if we are adjusting the phase for the skipped base, but 1 if not. I am going to bring in some help - @barrymoore any thoughts?

barrymoore commented 6 years ago

Hi Tom @tmgreen ,

Yes, I think you’ve found an error in the spec - I would agree that if the ribosome slips, the new reading frame has to begin with a phase of 0. This example would have been around since the original first draft of the spec many years ago, and I think you may just be the first person to ever look at this closely enough to have caught that.

Thanks for the help in bringing this error to light.

@keilbeck, I’ll go ahead and make this change and if you want further discussion or changes we can talk in more detail later.

Barry

barrymoore commented 6 years ago

Hi @tmgreen and @keilbeck can you guys take a look at the diff on this edit and make any comments you feel are appropriate:

https://github.com/The-Sequence-Ontology/Specifications/commit/3dc0607dc6d6843f468074d4edfaf9a278b6f9e8#diff-147bf529caea8d1e898e580393804c95

barrymoore commented 6 years ago

I uploaded a sketch as an example. Can't seem to get it to link inline here, but here is a link out to it:

PFS_Examples.pdf

I think the Lincoln's original example fails because the gene and mRNA should end at ZZZZ not YYYY and his pseudo numbers for the actual FS site on the CDS features work for a +1 FS, but the example doesn't specify that and seems to imply it's a -1 FS. Using the original example as a template, modifying it to use actual numbers (but not real world numbers yet) corresponding to my sketch linked above and only showing the frameshifting CDS feature:

-1 Frameshift

chrX  . gene               1   300 .  +  . ID=gene01;name=my_gene
chrX  . mRNA               1   300 .  +  . ID=tran01;Parent=gene01;Ontology_term=SO:1000069
chrX  . CDS                1   249 .  +  0 ID=CDS02;Parent=tran01
chrX  . CDS               246 300 .  +  0 ID=CDS02;Parent=tran01

+1 Frameshift

chrX  . gene               1   302 .  +  . ID=gene01;name=my_gene
chrX  . mRNA               1   302 .  +  . ID=tran01;Parent=gene01;Ontology_term=SO:1000069
chrX  . CDS                1   249 .  +  0 ID=CDS02;Parent=tran01
chrX  . CDS               248 302 .  +  0 ID=CDS02;Parent=tran01
tmgreen commented 6 years ago

Hmm, I think the numbers should be:

-1 Frameshift

chrX  . gene               1   300 .  +  . ID=gene01;name=my_gene
chrX  . mRNA               1   300 .  +  . ID=tran01;Parent=gene01;Ontology_term=SO:1000069
chrX  . CDS                1   249 .  +  0 ID=CDS02;Parent=tran01
chrX  . CDS                249 300 .  +  0 ID=CDS02;Parent=tran01

+1 Frameshift

chrX  . gene               1   302 .  +  . ID=gene01;name=my_gene
chrX  . mRNA               1   302 .  +  . ID=tran01;Parent=gene01;Ontology_term=SO:1000069
chrX  . CDS                1   249 .  +  0 ID=CDS02;Parent=tran01
chrX  . CDS                251 302 .  +  0 ID=CDS02;Parent=tran01
tmgreen commented 6 years ago

I think a -1 frameshift re-uses one base so it looks like

AGCTA
    AGTACTT

(where only one A is shared because the slippage is backwards one base)

A +1 shift would be like

AGCTA
     G
      TACTT

Where a single base gets skipped over entirely.