GMOD / jbrowse

JBrowse 1, a full-featured genome browser built with JavaScript and HTML5. For JBrowse 2, see https://github.com/GMOD/jbrowse-components.
http://jbrowse.org
Other
461 stars 199 forks source link

lines between neighboured genes #1486

Closed mictadlo closed 4 years ago

mictadlo commented 4 years ago

Hi, In Jbrowse/Apollo when I select my first chromosome all my genes are connected by 1 line as shown below:

image

However, the chromosome has not that problem:

image

I have loaded the GFF3 with perl ~/tomcat7-deploy/apollo/jbrowse/bin/flatfile-to-json.pl --nameAttributes "ID,Name,Note" --gff test-chr1-2.gff3 --compress --trackType HTMLFeatures --trackLabel "Gene Model 1-2 (hybrid)" --type=mRNA --out /efs/apollo/QLD2/ragooQLD500vsNbV1Ch-v2

The test files can be found here .

What could be causing the line to be created?

Thank you in advance.

Michal

cmdcolin commented 4 years ago

The GFF file may be corrupt

I looked at this and saw a random feature exhibiting the weird behavior, NBqldanG09240.1

I grep this feature NBqldanG09240.1 from the GFF and it has the following lines

NBqld01_pan transdecoder    mRNA    100292823   100341357   .   -   .   ID=NBqldanG09240.1;Note=Mediator of RNA polymerase II transcription subunit 34;Parent=NBqldanG09240
NBqld01_pan transdecoder    five_prime_UTR  100341270   100341357   .   -   .   ID=NBqldanG09240.1.utr5p1;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    five_prime_UTR  100341092   100341162   .   -   .   ID=NBqldanG09240.1.utr5p2;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    five_prime_UTR  100340027   100340145   .   -   .   ID=NBqldanG09240.1.utr5p3;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    five_prime_UTR  100332012   100332232   .   -   .   ID=NBqldanG09240.1.utr5p4;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    five_prime_UTR  100330947   100330961   .   -   .   ID=NBqldanG09240.1.utr5p5;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100341270   100341357   .   -   .   ID=NBqldanG09240.1.exon1;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100341092   100341162   .   -   .   ID=NBqldanG09240.1.exon2;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100340027   100340145   .   -   .   ID=NBqldanG09240.1.exon3;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100332012   100332232   .   -   .   ID=NBqldanG09240.1.exon4;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100330815   100330961   .   -   .   ID=NBqldanG09240.1.exon5;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100330815   100330946   .   -   0   ID=NBqldanG09240.1.cds1;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100330533   100330748   .   -   .   ID=NBqldanG09240.1.exon6;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100330533   100330748   .   -   0   ID=NBqldanG09240.1.cds2;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100324942   100325046   .   -   .   ID=NBqldanG09240.1.exon7;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100324942   100325046   .   -   0   ID=NBqldanG09240.1.cds3;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100324729   100324833   .   -   .   ID=NBqldanG09240.1.exon8;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100324729   100324833   .   -   0   ID=NBqldanG09240.1.cds4;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100324519   100324641   .   -   .   ID=NBqldanG09240.1.exon9;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100324519   100324641   .   -   0   ID=NBqldanG09240.1.cds5;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100322474   100322562   .   -   .   ID=NBqldanG09240.1.exon10;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100322474   100322562   .   -   0   ID=NBqldanG09240.1.cds6;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100318454   100318490   .   -   .   ID=NBqldanG09240.1.exon11;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100318454   100318490   .   -   1   ID=NBqldanG09240.1.cds7;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100318341   100318374   .   -   .   ID=NBqldanG09240.1.exon12;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100318341   100318374   .   -   0   ID=NBqldanG09240.1.cds8;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100311729   100311781   .   -   .   ID=NBqldanG09240.1.exon13;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100311729   100311781   .   -   2   ID=NBqldanG09240.1.cds9;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100311584   100311661   .   -   .   ID=NBqldanG09240.1.exon14;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100311584   100311661   .   -   0   ID=NBqldanG09240.1.cds10;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100311439   100311504   .   -   .   ID=NBqldanG09240.1.exon15;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100311439   100311504   .   -   0   ID=NBqldanG09240.1.cds11;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100309344   100309410   .   -   .   ID=NBqldanG09240.1.exon16;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100309344   100309410   .   -   0   ID=NBqldanG09240.1.cds12;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100309195   100309248   .   -   .   ID=NBqldanG09240.1.exon17;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100309195   100309248   .   -   2   ID=NBqldanG09240.1.cds13;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100299937   100300044   .   -   .   ID=NBqldanG09240.1.exon18;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100299937   100300044   .   -   2   ID=NBqldanG09240.1.cds14;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100299449   100299513   .   -   .   ID=NBqldanG09240.1.exon19;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100299449   100299513   .   -   2   ID=NBqldanG09240.1.cds15;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100299008   100299086   .   -   .   ID=NBqldanG09240.1.exon20;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100299008   100299086   .   -   0   ID=NBqldanG09240.1.cds16;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100294564   100294781   .   -   .   ID=NBqldanG09240.1.exon21;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100294564   100294781   .   -   2   ID=NBqldanG09240.1.cds17;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    exon    100292823   100293257   .   -   .   ID=NBqldanG09240.1.exon22;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    CDS 100293069   100293257   .   -   0   ID=NBqldanG09240.1.cds18;Parent=NBqldanG09240.1
NBqld01_pan transdecoder    three_prime_UTR 100292823   100293068   .   -   .   ID=NBqldanG09240.1.utr3p1;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    mRNA    60543002    60546039    .   -   .   ID=NBqldanG09240.1;Note=Histone acetyltransferase HAC1;Parent=NBqldanG09240
NBqld02_pan transdecoder    five_prime_UTR  60545846    60546039    .   -   .   ID=NBqldanG09240.1.utr5p1;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    five_prime_UTR  60545206    60545357    .   -   .   ID=NBqldanG09240.1.utr5p2;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    exon    60545846    60546039    .   -   .   ID=NBqldanG09240.1.exon1;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    exon    60545015    60545357    .   -   .   ID=NBqldanG09240.1.exon2;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    CDS 60545015    60545205    .   -   0   ID=NBqldanG09240.1.cds1;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    exon    60544562    60544670    .   -   .   ID=NBqldanG09240.1.exon3;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    CDS 60544562    60544670    .   -   1   ID=NBqldanG09240.1.cds2;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    exon    60543991    60544261    .   -   .   ID=NBqldanG09240.1.exon4;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    CDS 60543991    60544261    .   -   0   ID=NBqldanG09240.1.cds3;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    exon    60543391    60543891    .   -   .   ID=NBqldanG09240.1.exon5;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    CDS 60543391    60543891    .   -   2   ID=NBqldanG09240.1.cds4;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    exon    60543002    60543286    .   -   .   ID=NBqldanG09240.1.exon6;Parent=NBqldanG09240.1
NBqld02_pan transdecoder    CDS 60543003    60543286    .   -   2   ID=NBqldanG09240.1.cds5;Parent=NBqldanG09240.1

Toward the bottom, there are CDS that are at coordinate 60543003 while the rest are around 100330815, that is 39mbp away

This is outside the boundary of the mRNA parent feature but it tries to draw it anyways resulting in glitches

mictadlo commented 4 years ago

@cmdcolin Thank you find the problem. @Juke34 Does AGAT has any tools to fix the above the problem?

Thank you in advance,

Michal

Juke34 commented 4 years ago

The two mRNAs should not be sharing the same parent, they are part of two different loci.

You can try with AGAT (agat_sp_gxf_to_gff3.pl), but I'm not sure how it will behaves. I do not think having implemented any function that try to separate mRNA that are wrongly linked to the same gene (It does for sure the opposite, it can merge mRNA under the same gene when they overlap and the corresponding option is activate).

Is there any gene feature in the GFF file?

Did you created this file with AGAT? Because you can end up with that kind of output where all mRNA are linked to a single gene feature when parsing a file: i) without Id / parent relationship + ii) containing only level3 feature (no gene no mRNA) + iii) without a locus tag (attribute locus_tag or gene_id in the 9th column) that allows to group linked feature together. In such rare case the only way to properly group features together it is to specify to agat_sp_gxf_to_gff3 what attribute to use as locus tag in order to group feature together.

In anyway, you can fix your problem by removing all Parent attributes, and probably the gene(s) feature(s) and run agat_sp_gxf_to_gff3.pl. I will recreate gene features and link mRNAs properly to them and gather mRNAs isoforms under the same gene umbrella only when they overlap.

cmdcolin commented 4 years ago

This is probably not a jbrowse bug so I will close for now, let me know if theres anything else