GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
128 stars 85 forks source link

incorporate frameshifts #483

Open nathandunn opened 9 years ago

nathandunn commented 9 years ago

====

as @cmdcolin noted, we have a bunch of code for frameshifts and they are in the code, but we do not appear to actually be able to add them.

Something for us to discuss at some point.

===

Output both annotations (original and pre-frameshifted).

nathandunn commented 9 years ago

@monicacecilia Please comment and then assign to me with recommendations when you are testing.

monicacecilia commented 8 years ago

It's all coming back.

Desktop Apollo had a function that allowed curators to shift the frame of translation +1 or -1 from the base pair where the cursor stood.

This is what it looked like: screen shot 2016-01-21 at 5 46 36 pm

In some organisms, cells naturally shift the frame of translation to express a gene (the ribosome skips, basically). This was common in some Drosophila genes and the request was made way back when. For an example see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC108870/

This code should be re-implemented, but this is not of the highest priority at this moment. I'm punting this down to the time after coordinate transformation and variant annotation are implemented and working as desired.

nathandunn commented 8 years ago

:+1:

monicacecilia commented 7 years ago

@selewis & @nathandunn: It will be very useful to come back to this ticket and work the implementation of this functionality in the near future.

hexylena commented 7 years ago

Very common in phages, but sometimes the frameshifts are more than just ±1, e.g. http://www.sciencedirect.com/science/article/pii/S1097276504005398

Incredibly important to CPT's use case I believe. cc @moffmade

The lack of support is a bit of a complex issue, since JBrowse will not render to-spec gff3 including frameshifts. xref https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md you'll have to ctrl-f for "programmed frameshift".

jimhu-tamu commented 6 years ago

What is the status of resolving this issue?

nathandunn commented 6 years ago

I think we've deferred due to our time constraints. However, if this is something you'd be interested in implementing, we'd be more than happy to work with you on it. Also, we are doing a hackathon in January if that would be convenient.

jimhu-tamu commented 6 years ago

I'm asking based on the class that @erasche was referring to, which we will start teaching again in January. @moffmade is now working with us on continuing Eric's work, and the timing is bad for us to attend. But I just sent him the link to look at the agenda. We have an even more critical Apollo problem that he will add an issue here for soon.

nathandunn commented 6 years ago

@moffmade is welcome to join us remotely, as well, but that will be busy time for teaching. Yeah, let us know about the critical problems and your timeline for teaching. Our hope is that we can possibly get @moffmade doing a few of these fixes himself after getting somewhat familiar with the stack, if he has time.

On Dec 21, 2017, at 11:22 AM, Jim Hu notifications@github.com wrote:

I'm asking based on the class that @erasche https://github.com/erasche was referring to, which we will start teaching again in January. @moffmad is now working with us on continuing Eric's work, and the timing is bad for us to attend. But I just sent him the link to look at the agenda. We have an even more critical Apollo problem that he will add an issue here for soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GMOD/Apollo/issues/483#issuecomment-353435648, or mute the thread https://github.com/notifications/unsubscribe-auth/AAt2qjub4Uhbmq1Qh2AbODVU4Dx868GCks5tCq_ugaJpZM4FaLoa.

hexylena commented 6 years ago
jimhu-tamu commented 6 years ago

oops. Updated my reply above for Corey's correct id.

nathandunn commented 6 years ago

@jimhu-tamu / @MoffMade , @erasche assessment is probably correct. I'll be available to do a remote call on the 4th if its something you might be interested in pursuing. However, I would estimate 2-4 weeks even with our help if I remember this issue correctly.

Maybe @erasche can make some introductions off-line. We can make arrangements over the break (and am happy to point folks to resources).

hexylena commented 6 years ago

Offline introduction? I'm physically unavailable until february (holiday.)

nathandunn commented 6 years ago

Sorry i meant off of GitHub via email. No need for travel! I’ll wait until we see you at the galaxy conference to see you in person.

Nathan

On Dec 22, 2017, at 6:44 AM, Eric Rasche notifications@github.com wrote:

Offline introduction? I'm physically unavailable until february (holiday.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

hexylena commented 6 years ago

Back at work, sure, available on the 4th if you need a videoconf or something for more detailed explanation.

nathandunn commented 6 years ago

https://github.com/TAMU-CPT/training-material/blob/bich464/topics/genome-annotation/tutorials/annotating-tmp-chaperone-frameshifts/tutorial.md

nathandunn commented 5 years ago

From notes:

nathandunn commented 5 years ago

Treat similarly to a read through stop codon, but base specific

nathandunn commented 5 years ago

Per discussions with TAMU group @meiliucpt will add some export examples.

meiliuCPT commented 5 years ago

Using NCBI: https://www.ncbi.nlm.nih.gov/nuccore/1428093527 as an example (GenBank: MH321492.1: /locus_tag="Lorac_015" is the frameshifted protein, and /locus_tag="Lorac_014" reads through the slippery sequence to the ORF's normal stop codon),

GenBank record for the frameshift protein and its non-shifted version should look like this:

image

The converted gff3 (converted using our GenBank - GFF3 converter which is from BioPerl) looks like this:

image

The frameshifted and "normal" reading frames are represented as 2 separate genes.

In the frameshifted feature (Lorac_15), the GFF3 has the gene (Shine-Dalgarno + CDS) as parent, with the mRNA (1st base of CDS to last base of CDS) and Shine-Dalgarno as children. Under the mRNA are 2 CDS and 2 exon features. We're not sure how the frameshift is represented, are the 2 CDSs or 2 exons automatically merged into a single protein sequence when read?

Based on what we see, it looks like we've been representing frameshifts as basically 2 exons which then get merged (i.e., like 2 exons separated by an intron that is -1 bp in length), which is derived from how these are represented in GenBank. If we switched to representing these as an mRNA with a frameshift in it, that could be done but would be a departure from the current process and we'd need to make sure we had a way to place these features in Apollo and export them again in a way that GenBank can handle. I hope this explanation makes sense. Let me know if you have questions.