GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
128 stars 85 forks source link

Rendering issue with neat features #2063

Closed abretaud closed 5 years ago

abretaud commented 5 years ago

Testing Apollo 2.3.1 with NeathHTMLFeatures and NeatCanvasFeatures, I get this rendering issue:

apollo_neat

Here's the track config for UCA:

{
  "maxFeatureSizeForUnderlyingRefSeq": 250000,
  "subfeatureDetailLevel": 2,
  "maxFeatureScreenDensity": 0.5,
  "maxHeight": 600,
  "style": {
    "arrowheadClass": "annot-arrowhead",
    "className": "annot",
    "_defaultHistScale": 4,
    "_defaultLabelScale": 30,
    "_defaultDescriptionScale": 120,
    "minSubfeatureWidth": 1,
    "maxDescriptionLength": 70,
    "showLabels": true,
    "label": "name,id",
    "description": "note, description",
    "centerChildrenVertically": false,
    "renderClassName": "annot-render",
    "subfeatureClasses": {
      "UTR": "annot-UTR",
      "CDS": "annot-CDS",
      "exon": "container-100pct",
      "intron": null,
      "wholeCDS": null,
      "start_codon": null,
      "stop_codon": null,
      "match_part": "darkblue-80pct",
      "non_canonical_three_prime_splice_site": "noncanonical-splice-site",
      "non_canonical_five_prime_splice_site": "noncanonical-splice-site"
    },
    "alternateClasses": {
      "terminator": {
        "renderClassName": "terminator-render annot-apollo",
        "className": "terminator"
      },
      "transposable_element": {
        "renderClassName": "blue-ibeam-render annot-apollo",
        "className": "blue-ibeam"
      },
      "pseudogene": {
        "renderClassName": "gray-center-30pct annot-apollo",
        "className": "light-purple-80pct"
      },
      "snRNA": {
        "renderClassName": "gray-center-30pct annot-apollo",
        "className": "brightgreen-80pct"
      },
      "rRNA": {
        "renderClassName": "gray-center-30pct annot-apollo",
        "className": "brightgreen-80pct"
      },
      "snoRNA": {
        "renderClassName": "gray-center-30pct annot-apollo",
        "className": "brightgreen-80pct"
      },
      "repeat_region": {
        "className": "magenta-80pct"
      },
      "ncRNA": {
        "renderClassName": "gray-center-30pct annot-apollo",
        "className": "brightgreen-80pct"
      },
      "miRNA": {
        "renderClassName": "gray-center-30pct annot-apollo",
        "className": "brightgreen-80pct"
      },
      "tRNA": {
        "renderClassName": "gray-center-30pct annot-apollo",
        "className": "brightgreen-80pct"
      },
      "SNV": {
        "renderClassName": "snv-variant",
        "className": "snv-variant-render"
      },
      "MNV": {
        "renderClassName": "mnv-variant",
        "className": "mnv-variant-render"
      },
      "insertion": {
        "renderClassName": "insertion-variant",
        "className": "insertion-variant-render"
      },
      "deletion": {
        "renderClassName": "deletion-variant",
        "className": "deletion-variant-render"
      }
    },
    "uniqueIdField": "id",
    "centerSubFeature": {
      "non_canonical_three_prime_splice_site": false,
      "non_canonical_five_prime_splice_site": false
    }
  },
  "hooks": {},
  "events": {},
  "menuTemplate": null,
  "noExport": true,
  "pinned": true,
  "autocomplete": "none",
  "key": "User-created Annotations",
  "storeClass": "WebApollo/Store/SeqFeature/ScratchPad",
  "phase": 0,
  "compress": 0,
  "label": "Annotations",
  "type": "WebApollo/View/Track/AnnotTrack",
  "subfeatures": 1,
  "baseUrl": "http://localhost:8500/apollo//1155672999863085291160881064/jbrowse/plugins/WebApollo/json/",
  "metadata": {}
}

And here's the config for the track I dragged models from (I left them untouched in the uca):

{
  "maxFeatureSizeForUnderlyingRefSeq": 250000,
  "subfeatureDetailLevel": 2,
  "maxFeatureScreenDensity": 0.5,
  "maxHeight": "600",
  "style": {
    "arrowheadClass": "webapollo-arrowhead",
    "className": "feature2",
    "_defaultHistScale": 4,
    "_defaultLabelScale": 30,
    "_defaultDescriptionScale": 120,
    "minSubfeatureWidth": 1,
    "maxDescriptionLength": 70,
    "showLabels": true,
    "label": "product,name,id",
    "description": "note,description",
    "centerChildrenVertically": false,
    "renderClassName": "gray-center-30pct annot-apollo",
    "subfeatureClasses": {
      "UTR": "webapollo-UTR",
      "CDS": "webapollo-CDS",
      "exon": "container-100pct",
      "intron": null,
      "wholeCDS": null,
      "start_codon": null,
      "stop_codon": null,
      "match_part": "darkblue-80pct"
    },
    "color": "#a6cee3"
  },
  "hooks": {},
  "events": {},
  "menuTemplate": [
    {
      "label": "View details",
      "title": "{type} {name}",
      "action": "contentDialog",
      "iconClass": "dijitIconTask"
    },
    {
      "iconClass": "dijitIconFilter"
    },
    {},
    {}
  ],
  "trackType": "NeatHTMLFeatures/View/Track/NeatFeatures",
  "topLevelFeatures": "mRNA",
  "overridePlugins": false,
  "urlTemplate": "raw/58e6be8d7034a88d34fec4bb2a578578_0.gff.gz",
  "overrideDraggable": false,
  "label": "58e6be8d7034a88d34fec4bb2a578578_0",
  "type": "WebApollo/View/Track/DraggableNeatHTMLFeatures",
  "storeClass": "JBrowse/Store/SeqFeature/GFF3Tabix",
  "category": "Default",
  "key": "merlin_html_2.gff",
  "baseUrl": "http://localhost:8500/apollo//1155672999863085291160881064/jbrowse/data/",
  "index": 1
}

(I think I've seen it in another issue, but I can't find it... Sorry if it's a duplicate)

abretaud commented 5 years ago

Just in case, the gff file I loaded in the merlin_html_2.gff track:

##gff-version 3
##sequence-region Merlin 1 172788
Merlin  GeneMark.hmm    gene    752 1039    -339.046618 +   .   ID=Merlin_2;seqid=Merlin
Merlin  GeneMark.hmm    mRNA    752 1039    .   +   .   ID=Merlin_2_mRNA;Parent=Merlin_2;seqid=Merlin
Merlin  GeneMark.hmm    CDS 752 830 .   +   0   ID=Merlin_2_CDS1;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    CDS 890 1039    .   +   0   ID=Merlin_2_CDS2;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    gene    1067    2011    -1229.683915    -   .   ID=Merlin_3;seqid=Merlin
Merlin  GeneMark.hmm    mRNA    1067    2011    .   -   .   ID=Merlin_3_mRNA;Parent=Merlin_3;seqid=Merlin
Merlin  GeneMark.hmm    CDS 1067    1500    .   -   0   ID=Merlin_3_CDS1;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    CDS 1600    1911    .   -   0   ID=Merlin_3_CDS2;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    UTR 1912    2011    .   -   0   ID=Merlin_3_UTR;Parent=Merlin_3_mRNA;seqid=Merlin
nathandunn commented 5 years ago

@abretaud Can you do a couple of things:

  1. Can you right-click on the annotations and display the GFF3 and various FASTA options for the two genes?
  2. That being said, do you have any non-standard configuration options set?

Its possible we aren't properly annotating CDS's, but I thought we would turn them into Exon's first.

abretaud commented 5 years ago
  1. Hum, not sure what you mean here :/
  2. I don't think so, though it's still a jbrowse exported from galaxy into my apollo docker image... The apollo-config.groovy is there: https://github.com/abretaud/docker-apollo/blob/bipaa/apollo-config.groovy

In case it helps, here's the html of the first gene in uca:

<div class="feature-label" style="top: 18px; left: 50.4%;">
    <div class="feature-name">Merlin_2-00001</div>
</div>
<div class="feature plus-annot ui-droppable" style="left: 50.4%; top: 0px; width: 57.6%; background-color: transparent; border-width: 0px;" _dijitmenudijit_menu_5="3">
    <div class="plus-annot-arrowhead" style="right: -12px;"></div>
    <div class="subfeature plus-container-100pct" style="left: 0%; width: 100%;">
        <div class="subfeature annot-CDS cds-frame1 neat-subfeature" style="left: 0%; width: 100%;"></div>
    </div>
    <div class="subfeature plus-container-100pct" style="left: 0%; width: 27.4306%;">
        <div class="subfeature annot-CDS cds-frame1 neat-subfeature" style="left: 0%; width: 100%;"></div>
    </div>
    <div class="subfeature plus-container-100pct ui-resizable" style="left: 47.9167%; width: 52.0833%;">
        <div class="subfeature annot-CDS cds-frame0 neat-subfeature" style="left: 0%; width: 100%;"></div>
        <div class="ui-resizable-handle ui-resizable-e" style="z-index: 90;"></div>
        <div class="ui-resizable-handle ui-resizable-w" style="z-index: 90;"></div>
    </div>
    <svg class="jb-intron" viewBox="0 0 100 100" preserveAspectRatio="none" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="position:absolute;z-index: 15;left: 27.4306%;width: 20.4861%;height: 100%"><polyline class="neat-intron" points="0,50 50,5 100,50" shape-rendering="optimizeQuality"></polyline></svg>
    <div class="feature-render annot-render"></div>
</div>
nathandunn commented 5 years ago
  1. I mean what is the exported GFF3 / FASTA of the annotation?

screen shot 2019-02-20 at 8 27 36 am screen shot 2019-02-20 at 8 27 41 am

  1. I mean in Apollo, there are some CDS options. If you have any relevant options in your apollo-config.groovy it would be useful to see.
abretaud commented 5 years ago
  1. Ok, here's the exported GFF:

    ##gff-version 3
    ##sequence-region Merlin 1 172788
    Merlin  .   gene    1067    2011    .   -   .   owner=abretaud@bipaa;ID=c2151e23-dde4-4bd5-bfd5-29d809d4b3ee;date_last_modified=2019-02-20;Name=Merlin_3_mRNA;date_creation=2019-02-20
    Merlin  .   mRNA    1067    2011    .   -   .   owner=abretaud@bipaa;Parent=c2151e23-dde4-4bd5-bfd5-29d809d4b3ee;ID=edebd2c1-d755-47d6-9f61-e8cb82548624;date_last_modified=2019-02-20;Name=Merlin_3_mRNA-00002;date_creation=2019-02-20
    Merlin  .   exon    1067    1464    .   -   .   Parent=edebd2c1-d755-47d6-9f61-e8cb82548624;ID=85560d2b-8f18-4fa9-a306-84fe04136335;Name=85560d2b-8f18-4fa9-a306-84fe04136335
    Merlin  .   exon    1498    2011    .   -   .   Parent=edebd2c1-d755-47d6-9f61-e8cb82548624;ID=5a34afd8-4889-4337-b338-8d2d42f1d170;Name=5a34afd8-4889-4337-b338-8d2d42f1d170
    Merlin  .   CDS 1498    2011    .   -   0   Parent=edebd2c1-d755-47d6-9f61-e8cb82548624;ID=edebd2c1-d755-47d6-9f61-e8cb82548624-CDS;Name=edebd2c1-d755-47d6-9f61-e8cb82548624-CDS
    Merlin  .   CDS 1067    1464    .   -   2   Parent=edebd2c1-d755-47d6-9f61-e8cb82548624;ID=edebd2c1-d755-47d6-9f61-e8cb82548624-CDS;Name=edebd2c1-d755-47d6-9f61-e8cb82548624-CDS
    ###
    Merlin  .   gene    752 1039    .   +   .   owner=abretaud@bipaa;ID=7d64d66d-32e9-46a0-b7e4-1b056f17b0a7;date_last_modified=2019-02-20;Name=Merlin_2;date_creation=2019-02-20
    Merlin  .   mRNA    752 1039    .   +   .   owner=abretaud@bipaa;Parent=7d64d66d-32e9-46a0-b7e4-1b056f17b0a7;ID=58514be3-b539-481b-8a22-9fe91d1275e1;date_last_modified=2019-02-20;Name=Merlin_2-00001;date_creation=2019-02-20
    Merlin  .   exon    752 1039    .   +   .   Parent=58514be3-b539-481b-8a22-9fe91d1275e1;ID=4ff428f2-a826-4c69-b5e1-96b3db640ac3;Name=4ff428f2-a826-4c69-b5e1-96b3db640ac3
    Merlin  .   CDS 752 1039    .   +   0   Parent=58514be3-b539-481b-8a22-9fe91d1275e1;ID=58514be3-b539-481b-8a22-9fe91d1275e1-CDS;Name=58514be3-b539-481b-8a22-9fe91d1275e1-CDS
    ###

    The exported cds fasta:

    >edebd2c1-d755-47d6-9f61-e8cb82548624 (mRNA) 912 residues [Merlin:1067-2011 - strand] [cds] name=Merlin_3_mRNA-00002
    ATGCTAACTTTAGATGAATTTAAAAACCAAGCGGGTAATATAGACTTTCAGCGTACTAAT
    ATGTTTAGTTGTGTATTTGCAACTACTCCGTCAGCAAAGTCTCAACAATTACTCGATCAA
    TTTGGCGGTATGCTCTTTAATAACCTTCCGTTGAATAATGACTGGCTTGGATTAACACAA
    GGTGAGTTCACATCAGGACTCACCTCAATTATCACTGCCGGTACTCAACAGCTGGTAAGA
    AAGTCTGGTGTATCGAAATATCTTATTGGAGCAATGAGCAATCGTGTTGTTCAGTCTTTA
    TTAGGTGAATTTGAAGTCGGAACTTATTTGTTAGACTTCTTTAACATGGCTTATCCGCAA
    TCTGGATTGATGATTTATTCGGTCAAAATTCCAGAGAACAGATTGTCTCATGAAATGGAT
    TTCAACCATAACTCACCGAATATTAGAATAACTGGACGTGAACTCGATCCGTTAACTATA
    TCATTCAGAATGGATCCCGAAGCAAGTAACTATCACCCGGTTACTGGATTGCGAGCATTA
    CCAACTGACGTCGAAGCTGACATTCAGGTTAACCTTCATGCTCGAAATGGATTACCTCAT
    ACTGTGATAATGTTCACAGGTTGTGTTCCTGTTGCGTGTGGAGCTCCTGAGCTTACATAT
    GAAGGAGATAACCAAATTGCGGTTTTCGATGTTACATTTGCTTACAGAGTAATGCAAACG
    GGTGCTGTTGGACGTCAAGCTGCTCTTGATTGGATTGAAGATAGAGCTGTTAATTCTATA
    ACTGGAATTAATAGTGAAATGTCTCTTAATGGAAGTTTAAGTAGATTATCTAGACTTGGA
    GGAGCTGCTGGAGGGTTGTCTCACGTCATTAATTCGACCCGAAACTCTACTTCGAAAATA
    CTTGGATTGTAA
    >58514be3-b539-481b-8a22-9fe91d1275e1 (mRNA) 288 residues [Merlin:752-1039 + strand] [cds] name=Merlin_2-00001
    ATGAAATCAATTTTTCGTATCAACGGTGTAGAAATTGTAGTTGAAGATGTAGTTCCTATG
    TCTTATGAATTCAATGAAGTTGTTTTCAAAGAGCTTAAGAAAATTTTAGGCGATAAGAAG
    CTTCAAAGTACTCCAATTGGACGTTTTGGAATGAAAGAAAACGTTGATACTTATATTGAA
    AGTGTAGTGACAGGGCAGTTAGAAGGTGAATTTTCTGTAGCAGTTCAAACTGTAGAAAAT
    GATGAAGTTATTTTAACTTTACCAGCTTTCGTAATTTTCCGCAAATAA
  2. ok, the apollo-config.groovy is still this one: https://github.com/abretaud/docker-apollo/blob/bipaa/apollo-config.groovy

nathandunn commented 5 years ago

Okay.

  1. You're not passing any options here?

https://github.com/abretaud/docker-apollo/blob/bipaa/apollo-config.groovy#L74-L90

  1. You get the same results on refresh?

  2. Can you click on the individual exons for each?

  3. What happens if you change the CDS to exon in your input GFF3? Apollo will recalculate by default, so its better (though not essential) if the input is exon. There are some options to over-ride this behavior.

abretaud commented 5 years ago
  1. Hum, I have this:

        WEBAPOLLO_CDS_FOR_NEW_TRANSCRIPTS: "true"
        WEBAPOLLO_FEATURE_HAS_DBXREFS: "true"
        WEBAPOLLO_FEATURE_HAS_ATTRS: "true"
        WEBAPOLLO_FEATURE_HAS_PUBMED: "true"
        WEBAPOLLO_FEATURE_HAS_GO: "true"
        WEBAPOLLO_FEATURE_HAS_COMMENTS: "true"
        WEBAPOLLO_FEATURE_HAS_STATUS: "true"
  2. yep

  3. yep

  4. I've just tried, it's the same

nathandunn commented 5 years ago

I would remove the WEBAPOLLO_CDS_FOR_NEW_TRANSCRIPTS line.

The default is false. Basically, this tries to use the existing CDS to calculate the new one. By default Apollo always tries to recalculate the most likely CDS based on the largest ORF. The most common use-case is to promote of bunch of existing predicted annotations, preserving their annotations.

If you don't have a good reason for making that true, I wouldn't set it to true.

abretaud commented 5 years ago

It's the same when removing WEBAPOLLO_CDS_FOR_NEW_TRANSCRIPTS

nathandunn commented 5 years ago

By the "same" did you mean, when you created an annotation again after removing the line (or setting it to false) and redeploying it?

If its not too much time, you might want to try explicitly setting it to false, redeploying, and re-create the annotations.

I'll try to take a closer look at it this week or early next.

abretaud commented 5 years ago

Yep, I meant there is still the problem after unsetting the env var (or setting it to false explicitly), redeploying, add a new organism with my jbrowse instance, and add genes to uca Thanks for the help!

nathandunn commented 5 years ago

Okay .. thanks for testing all of these. I’ll take a look. If the jbrowse data directory is small enough (<5 gb) if you send me that I could probably make short work of this issue.

Nathan

On Feb 21, 2019, at 8:30 AM, Anthony Bretaudeau notifications@github.com wrote:

Yep, I meant there is still the problem after unsetting the env var (or setting it to false explicitly), redeploying, add a new organism with my jbrowse instance, and add genes to uca Thanks for the help!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

abretaud commented 5 years ago

I've just sent 2 sample data dirs by email, you received it? (sent to lbl address) I have to leave now sorry

nathandunn commented 5 years ago

I have not seen it yet. If I don’t get it by tomorrow, maybe we can chat on Gitter.

Nathan

On Feb 21, 2019, at 8:44 AM, Anthony Bretaudeau notifications@github.com wrote:

I've just sent 2 sample data dirs by email, you received it? (sent to lbl address) I have to leave now sorry

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GMOD/Apollo/issues/2063#issuecomment-466072921, or mute the thread https://github.com/notifications/unsubscribe-auth/AAt2qpXrQFY9KGgdkLGDhtFGmkxDmw89ks5vPszegaJpZM4bFaKV.

nathandunn commented 5 years ago

data_apollo_exons.tar.gz data_apollo_cds.tar.gz wetransfer-c46e63.zip

nathandunn commented 5 years ago

Data from here: https://github.com/galaxyproject/tools-iuc/tree/master/tools/jbrowse/test-data

nathandunn commented 5 years ago

Interesting . . .the GFF3 is correct, but for some reason the returned element has two exons (or an exon and a CDS) in separate places:

screen shot 2019-02-25 at 10 17 12 am screen shot 2019-02-25 at 10 17 05 am

The problem is that the exon has 3 children that are exons (?!?), two of which are the original correct exons.

{"track":"Merlin","features":[{"location":{"fmin":751,"fmax":1039,"strand":1},"type":{"cv":{"name":"sequence"},"name":"mRNA"},"name":"Merlin_2","orig_id":"Merlin_2","children":[{"location":{"fmin":751,"fmax":1039,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"},"orig_id":"Merlin_2_mRNA","children":[{"location":{"fmin":751,"fmax":1039,"strand":1},"type":{"cv":{"name":"sequence"},"name":"CDS"}},{"location":{"fmin":751,"fmax":830,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"},"orig_id":"Merlin_2_CDS1"},{"location":{"fmin":889,"fmax":1039,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"},"orig_id":"Merlin_2_CDS2"}]}]}],"operation":"add_transcript","clientToken":"13959384401953500954"}

Viewing it in the JSON viewer, it like there is an intermediate sequence layer that shouldn't be there:

screen shot 2019-02-25 at 10 25 28 am

nathandunn commented 5 years ago

With the other one, its slightly different, but still not properly defined evidence:

image

Basically, it should always go, mRNA -> (exon|CDS) and exons should not generally have subfeatures, though if there is a good argument, I can take a look.

If sequence is something that should be mapped somewhere else, the let me know, but I'm not sure what it would be, since we are generally just looking for exon coordinates or CDS if exons are unavailable.

nathandunn commented 5 years ago

Though, the evidence suggests that what you have is correct when I view details (either the CDS or exon version should have worked), so I'm unsure why we are getting two layers.

Looking at the evidence, the GFF3 is a bit funky:

Merlin  GeneMark.hmm    CDS 752 830 .   +   0   ID=Merlin_2_CDS1;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    gene    752 1039    -339.046618 +   .   ID=Merlin_2;seqid=Merlin
Merlin  GeneMark.hmm    mRNA    752 1039    .   +   .   ID=Merlin_2_mRNA;Parent=Merlin_2;seqid=Merlin
Merlin  GeneMark.hmm    CDS 890 1039    .   +   0   ID=Merlin_2_CDS2;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    CDS 1067    1500    .   -   0   ID=Merlin_3_CDS1;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    gene    1067    2011    -1229.683915    -   .   ID=Merlin_3;seqid=Merlin
Merlin  GeneMark.hmm    mRNA    1067    2011    .   -   .   ID=Merlin_3_mRNA;Parent=Merlin_3;seqid=Merlin
Merlin  GeneMark.hmm    CDS 1600    1911    .   -   0   ID=Merlin_3_CDS2;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    UTR 1912    2011    .   -   0   ID=Merlin_3_UTR;Parent=Merlin_3_mRNA;seqid=Merlin

Fixed it here:

Merlin  GeneMark.hmm    gene    752 1039    -339.046618 +   .   ID=Merlin_2;seqid=Merlin
Merlin  GeneMark.hmm    mRNA    752 1039    .   +   .   ID=Merlin_2_mRNA;Parent=Merlin_2;seqid=Merlin
Merlin  GeneMark.hmm    CDS 752 830 .   +   0   ID=Merlin_2_CDS1;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    CDS 890 1039    .   +   0   ID=Merlin_2_CDS2;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    gene    1067    2011    -1229.683915    -   .   ID=Merlin_3;seqid=Merlin
Merlin  GeneMark.hmm    mRNA    1067    2011    .   -   .   ID=Merlin_3_mRNA;Parent=Merlin_3;seqid=Merlin
Merlin  GeneMark.hmm    CDS 1067    1500    .   -   0   ID=Merlin_3_CDS1;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    CDS 1600    1911    .   -   0   ID=Merlin_3_CDS2;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    UTR 1912    2011    .   -   0   ID=Merlin_3_UTR;Parent=Merlin_3_mRNA;seqid=Merlin
nathandunn commented 5 years ago

Actually, I think the sequence cv type should be there. I'm going to go through some regressions to see if I didn't inadvertently introduce a problem.

nathandunn commented 5 years ago

I redid this with just a straight GFF3 with 2.3.1 and the same problem:

##gff-version 3
##sequence-region Merlin 1 172788
Merlin  GeneMark.hmm    gene    752 1039    -339.046618 +   .   ID=Merlin_2;seqid=Merlin
Merlin  GeneMark.hmm    mRNA    752 1039    .   +   .   ID=Merlin_2_mRNA;Parent=Merlin_2;seqid=Merlin;Name=bob
Merlin  GeneMark.hmm    CDS 752 830 .   +   0   ID=Merlin_2_CDS1;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    CDS 890 1039    .   +   0   ID=Merlin_2_CDS2;Parent=Merlin_2_mRNA;seqid=Merlin

Merlin  GeneMark.hmm    gene    1067    2011    -1229.683915    -   .   ID=Merlin_3;seqid=Merlin
Merlin  GeneMark.hmm    mRNA    1067    2011    .   -   .   ID=Merlin_3_mRNA;Parent=Merlin_3;seqid=Merlin;Name=jenny
Merlin  GeneMark.hmm    CDS 1600    1911    .   -   0   ID=Merlin_3_CDS2;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    CDS 1067    1500    .   -   0   ID=Merlin_3_CDS1;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    UTR 1912    2011    .   -   0   ID=Merlin_3_UTR;Parent=Merlin_3_mRNA;seqid=Merlin

Testing with a 2.2.0 regression

nathandunn commented 5 years ago

Same result for 2.2.0 . . . I'm wondering it he issue might be more related to it using the GFF3Tabix store versus the NCList one. This would be good to fix, as I would prefer the native stores.

nathandunn commented 5 years ago

@abretaud I remember the problem.

The issue is that when using the GFF3Tabix, it flips out if it has a top-level gene class.

I changed it to be top-level mRNA:

Merlin  GeneMark.hmm    mRNA    752 1039    .   +   .   ID=Merlin_2_mRNA;seqid=Merlin;Name=bob
Merlin  GeneMark.hmm    CDS 752 830 .   +   0   ID=Merlin_2_CDS1;Parent=Merlin_2_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    CDS 890 1039    .   +   0   ID=Merlin_2_CDS2;Parent=Merlin_2_mRNA;seqid=Merlin

Merlin  GeneMark.hmm    mRNA    1067    2011    .   -   .   ID=Merlin_3_mRNA;seqid=Merlin;Name=jenny
Merlin  GeneMark.hmm    CDS 1600    1911    .   -   0   ID=Merlin_3_CDS2;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    CDS 1067    1500    .   -   0   ID=Merlin_3_CDS1;Parent=Merlin_3_mRNA;seqid=Merlin
Merlin  GeneMark.hmm    UTR 1912    2011    .   -   0   ID=Merlin_3_UTR;Parent=Merlin_3_mRNA;seqid=Merlin

and it works:

screen shot 2019-02-25 at 11 11 26 am

nathandunn commented 5 years ago

This obviously isn't acceptable as GFF3's should have genes in them. Two solutions:

  1. if topLevelFeatures are defined then use what is there

    • OR -
  2. if gene (or pseudogene) has subfeatures, we automatically use those unless gene is specified in topLevelFeatures

Anyway, this is critical for 2.4.0

abretaud commented 5 years ago

Cool, thanks for looking into it! I won't be able to do it until tomorrow, but no problem if you want me to test some patch

nathandunn commented 5 years ago

Nothing to do today (or tonight) for you. Hopefully I'll have something more working tomorrow so I can finish #2064