Closed childers closed 9 years ago
Oops, left out the insertion:
Scaffold1 WebApollo insertion 875615 875615 . + . Name=024CEEAB2AE0C4F9AE1490815C67FDC3;
>024CEEAB2AE0C4F9AE1490815C67FDC3
AAAAAAAAAAAAAAAAACCCCCCCCC
@nathandunn Expected output should be "same as above", not a non-coding feature (tRNA in this case) with CDSs.
Interesting, when you delete the insertion, everything else stays.
Also, if the insertion precedes the annotation you don't get the extra CDS calculations.
just fyi -- and I haven't looked to see if it applies here -- current, sequence alterations cannot overlap each other.
On Nov 26, 2014, at 1:50 PM, Nathan Dunn notifications@github.com wrote:
Interesting, when you delete the insertion, everything else stays.
Also, if the insertion precedes the annotation you don't get the extra CDS calculations.
— Reply to this email directly or view it on GitHub.
@monicacecilia Do we only want to calculate for "mRNA" transcripts, or other ones, as well?
@monicacecilia It is calculating the CDS when setting the longest ORF. I am telling it to do this only if it is a transcript of type mRNA. Not sure if snRNA, miRNA, etc. should also be included. . . for calculating or if we should exclude tRNA, ncRNA, etc.
yes, also exclude tRNA (any kind of ncRNA in other words). Only protein coding should have the protein coding calculated.
p.s. it would be difficult to allow genomic sequence alterations to overlap because then we'd have to know the order with which to apply the alterations. However when/if we reuse this code for indicating natural variation/alterations we'll need to allow overlaps coming from different individuals, even if the alterations for each individual are non-overlapping... (need to draw a picture)
-S
On Wed, Nov 26, 2014 at 2:08 PM, Nathan Dunn notifications@github.com wrote:
@monicacecilia https://github.com/monicacecilia It is calculating the CDS when setting the longest ORF. I am telling it to do this only if it is a transcript of type mRNA. Not sure if snRNA, miRNA, etc. should also be included. . . for calculating or if we should exclude tRNA, ncRNA, etc.
— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/30#issuecomment-64717778.
@cmdcolin @childers Sorry, I should have fixed this during the break as it was failing the regression tests. The original code was correct, we want to calculate the CDS if a coding transcript. The original fix should do this. I just need to create a test that handles both cases. Unfortunately the default is "transcript", which should not be encoding.
to be finicky
"transcript" is not necessarily encoding.
if it is a transcript of type mRNA (a subclass of transcript in SO) then it is an encoding transcript.
the transcripts of protein coding genes should properly be typed using the sub-class mRNA
On Mon, Dec 1, 2014 at 8:50 AM, Nathan Dunn notifications@github.com wrote:
@cmdcolin https://github.com/cmdcolin @childers https://github.com/childers Sorry, I should have fixed this during the break as it was failing the regression tests. The original code was correct, we want to calculate the CDS if a coding transcript. The original fix should do this. I just need to create a test that handles both cases. Unfortunately the default is "transcript", which should not be encoding.
— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/30#issuecomment-65094782.
You are exactly correct. “Transcript” is actually used with pseudogenes and MRNA (at least in our current system) has no sub-classess.
I’m pretty sure this implementation is correct, but I will need others to test. Once I get the rest of the 1.0.3 bugs sorted out, I’ll put it up for testing.
Nathan
On Dec 1, 2014, at 11:02 AM, selewis notifications@github.com wrote:
to be finicky
"transcript" is not necessarily encoding.
if it is a transcript of type mRNA (a subclass of transcript in SO) then it is an encoding transcript.
the transcripts of protein coding genes should properly be typed using the sub-class mRNA
On Mon, Dec 1, 2014 at 8:50 AM, Nathan Dunn notifications@github.com wrote:
@cmdcolin https://github.com/cmdcolin @childers https://github.com/childers Sorry, I should have fixed this during the break as it was failing the regression tests. The original code was correct, we want to calculate the CDS if a coding transcript. The original fix should do this. I just need to create a test that handles both cases. Unfortunately the default is "transcript", which should not be encoding.
— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/30#issuecomment-65094782.
— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/30#issuecomment-65116219.
works now. Thanks!
@monicacecilia I want to retest this in 2.0.0 I looked and it appears that we have the correct code in there, but I want some separate eyes to verify.
@nathandunn You are correct, genomic insertions are no longer causing the generation of a CDS in tRNA features. However, while investigating this issue, I found a few other things. Please see #263
@deepakunni3 Could this be relevant?
@nathandunn This bug doesn't occur in Apollo 2.0
The rationale holds for recalculating CDS only when the transcript is an instance of mRNA.
If a sequence insertion is made on the exon for a feature, Web Apollo calculates the longest CDS. This is a problem for non-coding features, which are not supposed to have CDS features. Here is an example:
After adding insertion: