GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
127 stars 85 forks source link

edit sequence error #2132

Open nathandunn opened 5 years ago

nathandunn commented 5 years ago

Annotations.gff3.gz

Cam_Hsc_genes_v1_UTRs-Hsc_scaff001-1..6046013.gff3.gz

From GenSas / Jodi Huffman

A GenSAS user reported this, and I am pretty sure it’s an Apollo error, but I just don’t know how to explain it.

Our user noticed that when he used the “Get Sequence” option on his gene model in the User-created Annnotations track, that a base was missing in the middle of the exon.

The base in question is highlighted in this screen shot: image

When I use the “Get Sequence” function of Apollo, the “A” (reverse strand) is missing in the cdna and genomic sequence (end of blue text)

>65ce83c2-4679-412f-aa07-2693cade723a (sequence:exon) 179 residues [Hsc_scaff001:1660380-1660559 - strand] [cdna]
CCTTGCCATGTGCTACAAAAGTGTTTTGTCCGGCGGAGGGAATTTTCCGTGGAAAGTGTCGCGTCGCTTTCGCTTCTTTTCATATTTGTTGGATAAAAGCGGAAAATGTGGCAAAAATGGCAAAAAAGTTCGTTTGCGATGAATGGCACGCAATTTTCCGTGGGAAGAGAATGCTTCAA
>65ce83c2-4679-412f-aa07-2693cade723a (sequence:exon) 179 residues [Hsc_scaff001:1660380-1660559 - strand] [genomic]
CCTTGCCATGTGCTACAAAAGTGTTTTGTCCGGCGGAGGGAATTTTCCGTGGAAAGTGTCGCGTCGCTTTCGCTTCTTTTCATATTTGTTGGATAAAAGCGGAAAATGTGGCAAAAATGGCAAAAAAGTTCGTTTGCGATGAATGGCACGCAATTTTCCGTGGGAAGAGAATGCTTCAA

But when I look at the sequence of the gene model that was dragged to the User-created Annotations track, the missing base is there (in red):

>Hsc_scaff001 Hsc_scaff001:1657217..1660839 (- strand) class=gene length=3623 (I removed the extra sequence, so coordinates are different)
CCTTGCCATGTGCTACAAAAGTGTTTTGTCCGGCGGAGGGAATTTTCCGTGGAAAGTGTCGCGTCGCTTTCGCTTCTTTTCATATTTGTTGGATAAAAGCGGAAAATGTGGCAAAAATGGCAAAAAAGTTCGTATTGCGATGAATGGCACGCAATTTTCCGTGGGAAGAGAATGCTTCAA

I think it’s really odd that a base is missing in the middle of the gene. I could see how this would happen at the ends, but not the middle. Since GenSAS uses coordinates from the user-created data, it should be fine for downstream stuff, but this really is bad for curators who use “get sequence” to get the protein/gene sequence for quick Blast searches and what not.

I have never noticed this before, but I haven’t done a ton of manual curation where I might notice it either. Just wanted to pass the observation along.

Jolivares-INRAE commented 4 years ago

Hi, i'm a new user of Apollo, and i have 2 very close issues described here on the same gene model. 1st: when i right click on the exon to get de CDS sequence a G is inserted (red arrow) and this lead to a shift in the frame reading and so in the amino acids traduction and splice sites analysis (black arrow)

Apollo1

2nd: when i right click on the exon to get de CDS sequence a T is deleted (red arrow) and then same problems with traduction, splice sites (black arrow) etc...

Apollo2

As you can imagine, it's a nightmare to get the right amino acids sequence.

Hope this can help ...

nathandunn commented 4 years ago

Can you verify the version of Apollo you are using?

I'm not seeing it using the 2.6.1 version of Apollo, but its possible there is some additional things I might need to do or I'm missing something:

image

Are you able to reproduce on the demo instance (I would recommend using Honeybee), which is running 2.6.1?

https://genomearchitect.readthedocs.io/en/latest/Demo.html

Jolivares-INRAE commented 4 years ago

i'm using Apollo avaible in GenSAS, the version may be "old":

Apollo Genome Annotator

Version: 2.0.7-snapshot
Grails version: 2.5.5
Groovy version: 2.4.4
JVM version: 1.8.0_252

I'm not sure to be able to reproduce on the demo instance, i'll have a look.

nathandunn commented 4 years ago

@Jolivares-INRAE send me an email if you want admin access . . the nathandunn at lbl.gov . . in order for you to upload genomes. However, the case should be the same. Use the honeybee organism.

nathandunn commented 3 years ago

@Jolivares-INRAE also, if you get the FASTA / GFF3 from that organism I can just reproduce it locally to see if its already been fixed:

image

Upload it here or email me a link nathandunn @ lbl.gov