Closed jbethune closed 3 years ago
Hi Thank you for bringing up this ambiguity The start in this situation is the start of the codon. @srynobio @barrymoore Do either of you have a good turn of phrase to clean up the last sentence of the first phase paragraph? --TH anks
Column 8: "phase" For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon. In other words, a phase of "0" indicates that the next codon begins at the first base of the region described by the current line, a phase of "1" indicates that the next codon begins at the second base of this region, and a phase of "2" indicates that the codon begins at the third base of this region. This is NOT to be confused with the frame, which is simply start modulo 3.
For forward strand features, phase is counted from the start field. For reverse strand features, phase is counted from the end field.
The phase is REQUIRED for all CDS features.
Good point - the current wording is rather convoluted. Here’s my attempt to clarify this a bit:
For features of type "CDS", the phase indicates where the next codon begins relative to the start of the current CDS feature. The phase is one of the integers 0, 1, or 2, indicating the number of bases forward from the start of the current CDS feature the next codon begins. A phase of "0" indicates that a codon begins on the first base of the CDS feature (i.e. 0 bases forward), a phase of "1" indicates that the next codon begins at the second base of this region and a phase of "2" indicates that the codon begins at the third base of this region. Note that ‘Phase’ in the context of a GFF3 CDS feature should not be confused with the similar concept of frame that is also a common concept in bioinformatics. Frame is generally calculated as a value for a given base relative to the start of a codon (e.g.
Hmm, did I make that clearer or just obfuscate it with different words :)
Barry
On Sep 11, 2019, at 1:41 PM, Karen EIlbeck notifications@github.com<mailto:notifications@github.com> wrote:
Hi Thank you for bringing up this ambiguity The start in this situation is the start of the codon. @srynobiohttps://github.com/srynobio @barrymoorehttps://github.com/barrymoore Do either of you have a good turn of phrase to clean up the last sentence of the first phase paragraph? --TH anks
Column 8: "phase" For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon. In other words, a phase of "0" indicates that the next codon begins at the first base of the region described by the current line, a phase of "1" indicates that the next codon begins at the second base of this region, and a phase of "2" indicates that the codon begins at the third base of this region. This is NOT to be confused with the frame, which is simply start modulo 3.
For forward strand features, phase is counted from the start field. For reverse strand features, phase is counted from the end field.
The phase is REQUIRED for all CDS features.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/The-Sequence-Ontology/Specifications/issues/20?email_source=notifications&email_token=AARDRW6BJKBVH3XQ45SVHYDQJFCWBA5CNFSM4IVSTAE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6PVDMI#issuecomment-530534833, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AARDRWYASEEVSK7EZQ6S4K3QJFCWBANCNFSM4IVSTAEQ.
This seems to be really difficult to put into words. Let me try a formulation of my own based on my current understanding:
For features of type "CDS", the phase tell us if the CDS begins with a complete codon (phase=0) or with an incomplete codon (phase=1 or phase=2). Incomplete codons are split across two exons and become complete after splicing. The phase tells us how many nucleotides we have to move towards a larger or smaller genomic position to get to the first complete codon in the genomic sequence of this CDS. On the plus strand we need to move from the start position to a larger genomic position and on the minus strand we need to move from the end position to a smaller genomic position.
The following table shows the different situations:
phase | strand | meaning of start/end column | move towards complete codon |
---|---|---|---|
0 | + | start=first base of complete codon | no move needed |
1 | + | start=third base of incomplete codon | move up from start position by 1 nucleotide |
2 | + | start=second base of incomplete codon | move up from start position by 2 nucleotides |
0 | - | end=third base of complete codon | no move needed |
1 | - | end=first base of incomplete codon | move down from end position by 1 nucleotide |
2 | - | end=second base of incomplete codon | move down from end position by 2 nucleotides |
Positions are always inclusive. The phase is REQUIRED for all CDS features.
The phase should not be confused with the reading frame. The reading frame refers to the genomic distance to the start codon of this gene modulo 3 regardless of where introns are located.
Is my understanding correct?
edit: Added table with the 6 different cases.
You description is correct, except that I think it’s worth clarifying your use of 5’ to 3’ and this also addresses your question about the minus strand.
The CDS Phase is always relative to the strand on which the containing transcript lies. So your statement about 5’ to 3’ is only accurate if you’re referring to the mRNA. For minus strand transcripts the genomic direction would be 3’ to 5’.
Thanks for the discussion. I’ve added some additional language to help clarify the minus/plus strand issues.
Spec is updated to 1.25. Feedback welcomed.
Regards
Barry
On Sep 23, 2019, at 2:42 AM, jbethune notifications@github.com<mailto:notifications@github.com> wrote:
This seems to be really difficult to put into words. Let me try a formulation of my own based on my current understanding:
For features of type "CDS", the phase tell us if the CDS starts with a complete codon (phase=0) or with an incomplete codon (phase=1 or phase=2). Incomplete codons are split across two exons and become complete after splicing. The phase tells you how many nucleotides you have to move in the 5' to 3' direction to get to the first complete codon in the genomic sequence of this CDS. For example, phase=1 means that you have to move 1 nucleotide and that you are in the 3rd base of an incomplete codon. Phase=2 means that you have to move 2 nucleotides and that you are in the 2nd base of an incomplete codon. Phase 0 means that you are already on the first base of a complete codon.
The phase should not be confused with the reading frame. The reading frame refers to the genomic distance to the start codon of this gene modulo 3 regardless of where introns are located.
Is my understanding correct? I am also unsure about the minus strand. It would be really good if the specification also explains what the phase means for the 3 cases on the minus strand.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/The-Sequence-Ontology/Specifications/issues/20?email_source=notifications&email_token=AARDRWY43YG4IHKWPJ5HGLTQLB6P5A5CNFSM4IVSTAE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7KFLBA#issuecomment-534009220, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AARDRWYX45OV4BDYJDCKGBTQLB6P5ANCNFSM4IVSTAEQ.
https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md#readme
In section Column 8: "phase" it says:
This is NOT to be confused with the frame, which is simply start modulo 3.
What does "start" refer to? The start column of the GFF3 file? In that case it would be the start position of the entire chromosome? Or does it refer to the start position of the start-codon?An example that shows the differences between phase and frame would be appreciated.