Closed abretaud closed 5 years ago
Just in case, the gff file I loaded in the merlin_html_2.gff track:
##gff-version 3
##sequence-region Merlin 1 172788
Merlin GeneMark.hmm gene 752 1039 -339.046618 + . ID=Merlin_2;seqid=Merlin
Merlin GeneMark.hmm mRNA 752 1039 . + . ID=Merlin_2_mRNA;Parent=Merlin_2;seqid=Merlin
Merlin GeneMark.hmm CDS 752 830 . + 0 ID=Merlin_2_CDS1;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin GeneMark.hmm CDS 890 1039 . + 0 ID=Merlin_2_CDS2;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin GeneMark.hmm gene 1067 2011 -1229.683915 - . ID=Merlin_3;seqid=Merlin
Merlin GeneMark.hmm mRNA 1067 2011 . - . ID=Merlin_3_mRNA;Parent=Merlin_3;seqid=Merlin
Merlin GeneMark.hmm CDS 1067 1500 . - 0 ID=Merlin_3_CDS1;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin GeneMark.hmm CDS 1600 1911 . - 0 ID=Merlin_3_CDS2;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin GeneMark.hmm UTR 1912 2011 . - 0 ID=Merlin_3_UTR;Parent=Merlin_3_mRNA;seqid=Merlin
@abretaud Can you do a couple of things:
Its possible we aren't properly annotating CDS's, but I thought we would turn them into Exon's first.
In case it helps, here's the html of the first gene in uca:
<div class="feature-label" style="top: 18px; left: 50.4%;">
<div class="feature-name">Merlin_2-00001</div>
</div>
<div class="feature plus-annot ui-droppable" style="left: 50.4%; top: 0px; width: 57.6%; background-color: transparent; border-width: 0px;" _dijitmenudijit_menu_5="3">
<div class="plus-annot-arrowhead" style="right: -12px;"></div>
<div class="subfeature plus-container-100pct" style="left: 0%; width: 100%;">
<div class="subfeature annot-CDS cds-frame1 neat-subfeature" style="left: 0%; width: 100%;"></div>
</div>
<div class="subfeature plus-container-100pct" style="left: 0%; width: 27.4306%;">
<div class="subfeature annot-CDS cds-frame1 neat-subfeature" style="left: 0%; width: 100%;"></div>
</div>
<div class="subfeature plus-container-100pct ui-resizable" style="left: 47.9167%; width: 52.0833%;">
<div class="subfeature annot-CDS cds-frame0 neat-subfeature" style="left: 0%; width: 100%;"></div>
<div class="ui-resizable-handle ui-resizable-e" style="z-index: 90;"></div>
<div class="ui-resizable-handle ui-resizable-w" style="z-index: 90;"></div>
</div>
<svg class="jb-intron" viewBox="0 0 100 100" preserveAspectRatio="none" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="position:absolute;z-index: 15;left: 27.4306%;width: 20.4861%;height: 100%"><polyline class="neat-intron" points="0,50 50,5 100,50" shape-rendering="optimizeQuality"></polyline></svg>
<div class="feature-render annot-render"></div>
</div>
apollo-config.groovy
it would be useful to see.Ok, here's the exported GFF:
##gff-version 3
##sequence-region Merlin 1 172788
Merlin . gene 1067 2011 . - . owner=abretaud@bipaa;ID=c2151e23-dde4-4bd5-bfd5-29d809d4b3ee;date_last_modified=2019-02-20;Name=Merlin_3_mRNA;date_creation=2019-02-20
Merlin . mRNA 1067 2011 . - . owner=abretaud@bipaa;Parent=c2151e23-dde4-4bd5-bfd5-29d809d4b3ee;ID=edebd2c1-d755-47d6-9f61-e8cb82548624;date_last_modified=2019-02-20;Name=Merlin_3_mRNA-00002;date_creation=2019-02-20
Merlin . exon 1067 1464 . - . Parent=edebd2c1-d755-47d6-9f61-e8cb82548624;ID=85560d2b-8f18-4fa9-a306-84fe04136335;Name=85560d2b-8f18-4fa9-a306-84fe04136335
Merlin . exon 1498 2011 . - . Parent=edebd2c1-d755-47d6-9f61-e8cb82548624;ID=5a34afd8-4889-4337-b338-8d2d42f1d170;Name=5a34afd8-4889-4337-b338-8d2d42f1d170
Merlin . CDS 1498 2011 . - 0 Parent=edebd2c1-d755-47d6-9f61-e8cb82548624;ID=edebd2c1-d755-47d6-9f61-e8cb82548624-CDS;Name=edebd2c1-d755-47d6-9f61-e8cb82548624-CDS
Merlin . CDS 1067 1464 . - 2 Parent=edebd2c1-d755-47d6-9f61-e8cb82548624;ID=edebd2c1-d755-47d6-9f61-e8cb82548624-CDS;Name=edebd2c1-d755-47d6-9f61-e8cb82548624-CDS
###
Merlin . gene 752 1039 . + . owner=abretaud@bipaa;ID=7d64d66d-32e9-46a0-b7e4-1b056f17b0a7;date_last_modified=2019-02-20;Name=Merlin_2;date_creation=2019-02-20
Merlin . mRNA 752 1039 . + . owner=abretaud@bipaa;Parent=7d64d66d-32e9-46a0-b7e4-1b056f17b0a7;ID=58514be3-b539-481b-8a22-9fe91d1275e1;date_last_modified=2019-02-20;Name=Merlin_2-00001;date_creation=2019-02-20
Merlin . exon 752 1039 . + . Parent=58514be3-b539-481b-8a22-9fe91d1275e1;ID=4ff428f2-a826-4c69-b5e1-96b3db640ac3;Name=4ff428f2-a826-4c69-b5e1-96b3db640ac3
Merlin . CDS 752 1039 . + 0 Parent=58514be3-b539-481b-8a22-9fe91d1275e1;ID=58514be3-b539-481b-8a22-9fe91d1275e1-CDS;Name=58514be3-b539-481b-8a22-9fe91d1275e1-CDS
###
The exported cds fasta:
>edebd2c1-d755-47d6-9f61-e8cb82548624 (mRNA) 912 residues [Merlin:1067-2011 - strand] [cds] name=Merlin_3_mRNA-00002
ATGCTAACTTTAGATGAATTTAAAAACCAAGCGGGTAATATAGACTTTCAGCGTACTAAT
ATGTTTAGTTGTGTATTTGCAACTACTCCGTCAGCAAAGTCTCAACAATTACTCGATCAA
TTTGGCGGTATGCTCTTTAATAACCTTCCGTTGAATAATGACTGGCTTGGATTAACACAA
GGTGAGTTCACATCAGGACTCACCTCAATTATCACTGCCGGTACTCAACAGCTGGTAAGA
AAGTCTGGTGTATCGAAATATCTTATTGGAGCAATGAGCAATCGTGTTGTTCAGTCTTTA
TTAGGTGAATTTGAAGTCGGAACTTATTTGTTAGACTTCTTTAACATGGCTTATCCGCAA
TCTGGATTGATGATTTATTCGGTCAAAATTCCAGAGAACAGATTGTCTCATGAAATGGAT
TTCAACCATAACTCACCGAATATTAGAATAACTGGACGTGAACTCGATCCGTTAACTATA
TCATTCAGAATGGATCCCGAAGCAAGTAACTATCACCCGGTTACTGGATTGCGAGCATTA
CCAACTGACGTCGAAGCTGACATTCAGGTTAACCTTCATGCTCGAAATGGATTACCTCAT
ACTGTGATAATGTTCACAGGTTGTGTTCCTGTTGCGTGTGGAGCTCCTGAGCTTACATAT
GAAGGAGATAACCAAATTGCGGTTTTCGATGTTACATTTGCTTACAGAGTAATGCAAACG
GGTGCTGTTGGACGTCAAGCTGCTCTTGATTGGATTGAAGATAGAGCTGTTAATTCTATA
ACTGGAATTAATAGTGAAATGTCTCTTAATGGAAGTTTAAGTAGATTATCTAGACTTGGA
GGAGCTGCTGGAGGGTTGTCTCACGTCATTAATTCGACCCGAAACTCTACTTCGAAAATA
CTTGGATTGTAA
>58514be3-b539-481b-8a22-9fe91d1275e1 (mRNA) 288 residues [Merlin:752-1039 + strand] [cds] name=Merlin_2-00001
ATGAAATCAATTTTTCGTATCAACGGTGTAGAAATTGTAGTTGAAGATGTAGTTCCTATG
TCTTATGAATTCAATGAAGTTGTTTTCAAAGAGCTTAAGAAAATTTTAGGCGATAAGAAG
CTTCAAAGTACTCCAATTGGACGTTTTGGAATGAAAGAAAACGTTGATACTTATATTGAA
AGTGTAGTGACAGGGCAGTTAGAAGGTGAATTTTCTGTAGCAGTTCAAACTGTAGAAAAT
GATGAAGTTATTTTAACTTTACCAGCTTTCGTAATTTTCCGCAAATAA
ok, the apollo-config.groovy is still this one: https://github.com/abretaud/docker-apollo/blob/bipaa/apollo-config.groovy
Okay.
https://github.com/abretaud/docker-apollo/blob/bipaa/apollo-config.groovy#L74-L90
You get the same results on refresh?
Can you click on the individual exons for each?
What happens if you change the CDS
to exon
in your input GFF3? Apollo will recalculate by default, so its better (though not essential) if the input is exon
. There are some options to over-ride this behavior.
Hum, I have this:
WEBAPOLLO_CDS_FOR_NEW_TRANSCRIPTS: "true"
WEBAPOLLO_FEATURE_HAS_DBXREFS: "true"
WEBAPOLLO_FEATURE_HAS_ATTRS: "true"
WEBAPOLLO_FEATURE_HAS_PUBMED: "true"
WEBAPOLLO_FEATURE_HAS_GO: "true"
WEBAPOLLO_FEATURE_HAS_COMMENTS: "true"
WEBAPOLLO_FEATURE_HAS_STATUS: "true"
yep
yep
I've just tried, it's the same
I would remove the WEBAPOLLO_CDS_FOR_NEW_TRANSCRIPTS
line.
The default is false. Basically, this tries to use the existing CDS to calculate the new one. By default Apollo always tries to recalculate the most likely CDS based on the largest ORF. The most common use-case is to promote of bunch of existing predicted annotations, preserving their annotations.
If you don't have a good reason for making that true, I wouldn't set it to true.
It's the same when removing WEBAPOLLO_CDS_FOR_NEW_TRANSCRIPTS
By the "same" did you mean, when you created an annotation again after removing the line (or setting it to false) and redeploying it?
If its not too much time, you might want to try explicitly setting it to false, redeploying, and re-create the annotations.
I'll try to take a closer look at it this week or early next.
Yep, I meant there is still the problem after unsetting the env var (or setting it to false explicitly), redeploying, add a new organism with my jbrowse instance, and add genes to uca Thanks for the help!
Okay .. thanks for testing all of these. I’ll take a look. If the jbrowse data directory is small enough (<5 gb) if you send me that I could probably make short work of this issue.
Nathan
On Feb 21, 2019, at 8:30 AM, Anthony Bretaudeau notifications@github.com wrote:
Yep, I meant there is still the problem after unsetting the env var (or setting it to false explicitly), redeploying, add a new organism with my jbrowse instance, and add genes to uca Thanks for the help!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
I've just sent 2 sample data dirs by email, you received it? (sent to lbl address) I have to leave now sorry
I have not seen it yet. If I don’t get it by tomorrow, maybe we can chat on Gitter.
Nathan
On Feb 21, 2019, at 8:44 AM, Anthony Bretaudeau notifications@github.com wrote:
I've just sent 2 sample data dirs by email, you received it? (sent to lbl address) I have to leave now sorry
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GMOD/Apollo/issues/2063#issuecomment-466072921, or mute the thread https://github.com/notifications/unsubscribe-auth/AAt2qpXrQFY9KGgdkLGDhtFGmkxDmw89ks5vPszegaJpZM4bFaKV.
Interesting . . .the GFF3 is correct, but for some reason the returned element has two exons (or an exon and a CDS) in separate places:
The problem is that the exon has 3 children that are exons (?!?), two of which are the original correct exons.
{"track":"Merlin","features":[{"location":{"fmin":751,"fmax":1039,"strand":1},"type":{"cv":{"name":"sequence"},"name":"mRNA"},"name":"Merlin_2","orig_id":"Merlin_2","children":[{"location":{"fmin":751,"fmax":1039,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"},"orig_id":"Merlin_2_mRNA","children":[{"location":{"fmin":751,"fmax":1039,"strand":1},"type":{"cv":{"name":"sequence"},"name":"CDS"}},{"location":{"fmin":751,"fmax":830,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"},"orig_id":"Merlin_2_CDS1"},{"location":{"fmin":889,"fmax":1039,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"},"orig_id":"Merlin_2_CDS2"}]}]}],"operation":"add_transcript","clientToken":"13959384401953500954"}
Viewing it in the JSON viewer, it like there is an intermediate sequence
layer that shouldn't be there:
With the other one, its slightly different, but still not properly defined evidence:
Basically, it should always go, mRNA -> (exon|CDS) and exons should not generally have subfeatures, though if there is a good argument, I can take a look.
If sequence
is something that should be mapped somewhere else, the let me know, but I'm not sure what it would be, since we are generally just looking for exon coordinates or CDS if exons are unavailable.
Though, the evidence suggests that what you have is correct when I view details (either the CDS or exon version should have worked), so I'm unsure why we are getting two layers.
Looking at the evidence, the GFF3 is a bit funky:
Merlin GeneMark.hmm CDS 752 830 . + 0 ID=Merlin_2_CDS1;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin GeneMark.hmm gene 752 1039 -339.046618 + . ID=Merlin_2;seqid=Merlin
Merlin GeneMark.hmm mRNA 752 1039 . + . ID=Merlin_2_mRNA;Parent=Merlin_2;seqid=Merlin
Merlin GeneMark.hmm CDS 890 1039 . + 0 ID=Merlin_2_CDS2;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin GeneMark.hmm CDS 1067 1500 . - 0 ID=Merlin_3_CDS1;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin GeneMark.hmm gene 1067 2011 -1229.683915 - . ID=Merlin_3;seqid=Merlin
Merlin GeneMark.hmm mRNA 1067 2011 . - . ID=Merlin_3_mRNA;Parent=Merlin_3;seqid=Merlin
Merlin GeneMark.hmm CDS 1600 1911 . - 0 ID=Merlin_3_CDS2;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin GeneMark.hmm UTR 1912 2011 . - 0 ID=Merlin_3_UTR;Parent=Merlin_3_mRNA;seqid=Merlin
Fixed it here:
Merlin GeneMark.hmm gene 752 1039 -339.046618 + . ID=Merlin_2;seqid=Merlin
Merlin GeneMark.hmm mRNA 752 1039 . + . ID=Merlin_2_mRNA;Parent=Merlin_2;seqid=Merlin
Merlin GeneMark.hmm CDS 752 830 . + 0 ID=Merlin_2_CDS1;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin GeneMark.hmm CDS 890 1039 . + 0 ID=Merlin_2_CDS2;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin GeneMark.hmm gene 1067 2011 -1229.683915 - . ID=Merlin_3;seqid=Merlin
Merlin GeneMark.hmm mRNA 1067 2011 . - . ID=Merlin_3_mRNA;Parent=Merlin_3;seqid=Merlin
Merlin GeneMark.hmm CDS 1067 1500 . - 0 ID=Merlin_3_CDS1;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin GeneMark.hmm CDS 1600 1911 . - 0 ID=Merlin_3_CDS2;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin GeneMark.hmm UTR 1912 2011 . - 0 ID=Merlin_3_UTR;Parent=Merlin_3_mRNA;seqid=Merlin
Actually, I think the sequence
cv type should be there. I'm going to go through some regressions to see if I didn't inadvertently introduce a problem.
I redid this with just a straight GFF3 with 2.3.1 and the same problem:
##gff-version 3
##sequence-region Merlin 1 172788
Merlin GeneMark.hmm gene 752 1039 -339.046618 + . ID=Merlin_2;seqid=Merlin
Merlin GeneMark.hmm mRNA 752 1039 . + . ID=Merlin_2_mRNA;Parent=Merlin_2;seqid=Merlin;Name=bob
Merlin GeneMark.hmm CDS 752 830 . + 0 ID=Merlin_2_CDS1;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin GeneMark.hmm CDS 890 1039 . + 0 ID=Merlin_2_CDS2;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin GeneMark.hmm gene 1067 2011 -1229.683915 - . ID=Merlin_3;seqid=Merlin
Merlin GeneMark.hmm mRNA 1067 2011 . - . ID=Merlin_3_mRNA;Parent=Merlin_3;seqid=Merlin;Name=jenny
Merlin GeneMark.hmm CDS 1600 1911 . - 0 ID=Merlin_3_CDS2;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin GeneMark.hmm CDS 1067 1500 . - 0 ID=Merlin_3_CDS1;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin GeneMark.hmm UTR 1912 2011 . - 0 ID=Merlin_3_UTR;Parent=Merlin_3_mRNA;seqid=Merlin
Testing with a 2.2.0 regression
Same result for 2.2.0 . . . I'm wondering it he issue might be more related to it using the GFF3Tabix store versus the NCList one. This would be good to fix, as I would prefer the native stores.
@abretaud I remember the problem.
The issue is that when using the GFF3Tabix, it flips out if it has a top-level gene class.
I changed it to be top-level mRNA:
Merlin GeneMark.hmm mRNA 752 1039 . + . ID=Merlin_2_mRNA;seqid=Merlin;Name=bob
Merlin GeneMark.hmm CDS 752 830 . + 0 ID=Merlin_2_CDS1;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin GeneMark.hmm CDS 890 1039 . + 0 ID=Merlin_2_CDS2;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin GeneMark.hmm mRNA 1067 2011 . - . ID=Merlin_3_mRNA;seqid=Merlin;Name=jenny
Merlin GeneMark.hmm CDS 1600 1911 . - 0 ID=Merlin_3_CDS2;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin GeneMark.hmm CDS 1067 1500 . - 0 ID=Merlin_3_CDS1;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin GeneMark.hmm UTR 1912 2011 . - 0 ID=Merlin_3_UTR;Parent=Merlin_3_mRNA;seqid=Merlin
and it works:
This obviously isn't acceptable as GFF3's should have genes in them. Two solutions:
if topLevelFeatures are defined then use what is there
if gene (or pseudogene) has subfeatures, we automatically use those unless gene
is specified in topLevelFeatures
Anyway, this is critical for 2.4.0
Cool, thanks for looking into it! I won't be able to do it until tomorrow, but no problem if you want me to test some patch
Nothing to do today (or tonight) for you. Hopefully I'll have something more working tomorrow so I can finish #2064
Testing Apollo 2.3.1 with NeathHTMLFeatures and NeatCanvasFeatures, I get this rendering issue:
Here's the track config for UCA:
And here's the config for the track I dragged models from (I left them untouched in the uca):
(I think I've seen it in another issue, but I can't find it... Sorry if it's a duplicate)