Closed mictadlo closed 4 years ago
The GFF file may be corrupt
I looked at this and saw a random feature exhibiting the weird behavior, NBqldanG09240.1
I grep this feature NBqldanG09240.1 from the GFF and it has the following lines
NBqld01_pan transdecoder mRNA 100292823 100341357 . - . ID=NBqldanG09240.1;Note=Mediator of RNA polymerase II transcription subunit 34;Parent=NBqldanG09240
NBqld01_pan transdecoder five_prime_UTR 100341270 100341357 . - . ID=NBqldanG09240.1.utr5p1;Parent=NBqldanG09240.1
NBqld01_pan transdecoder five_prime_UTR 100341092 100341162 . - . ID=NBqldanG09240.1.utr5p2;Parent=NBqldanG09240.1
NBqld01_pan transdecoder five_prime_UTR 100340027 100340145 . - . ID=NBqldanG09240.1.utr5p3;Parent=NBqldanG09240.1
NBqld01_pan transdecoder five_prime_UTR 100332012 100332232 . - . ID=NBqldanG09240.1.utr5p4;Parent=NBqldanG09240.1
NBqld01_pan transdecoder five_prime_UTR 100330947 100330961 . - . ID=NBqldanG09240.1.utr5p5;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100341270 100341357 . - . ID=NBqldanG09240.1.exon1;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100341092 100341162 . - . ID=NBqldanG09240.1.exon2;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100340027 100340145 . - . ID=NBqldanG09240.1.exon3;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100332012 100332232 . - . ID=NBqldanG09240.1.exon4;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100330815 100330961 . - . ID=NBqldanG09240.1.exon5;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100330815 100330946 . - 0 ID=NBqldanG09240.1.cds1;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100330533 100330748 . - . ID=NBqldanG09240.1.exon6;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100330533 100330748 . - 0 ID=NBqldanG09240.1.cds2;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100324942 100325046 . - . ID=NBqldanG09240.1.exon7;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100324942 100325046 . - 0 ID=NBqldanG09240.1.cds3;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100324729 100324833 . - . ID=NBqldanG09240.1.exon8;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100324729 100324833 . - 0 ID=NBqldanG09240.1.cds4;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100324519 100324641 . - . ID=NBqldanG09240.1.exon9;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100324519 100324641 . - 0 ID=NBqldanG09240.1.cds5;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100322474 100322562 . - . ID=NBqldanG09240.1.exon10;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100322474 100322562 . - 0 ID=NBqldanG09240.1.cds6;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100318454 100318490 . - . ID=NBqldanG09240.1.exon11;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100318454 100318490 . - 1 ID=NBqldanG09240.1.cds7;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100318341 100318374 . - . ID=NBqldanG09240.1.exon12;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100318341 100318374 . - 0 ID=NBqldanG09240.1.cds8;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100311729 100311781 . - . ID=NBqldanG09240.1.exon13;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100311729 100311781 . - 2 ID=NBqldanG09240.1.cds9;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100311584 100311661 . - . ID=NBqldanG09240.1.exon14;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100311584 100311661 . - 0 ID=NBqldanG09240.1.cds10;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100311439 100311504 . - . ID=NBqldanG09240.1.exon15;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100311439 100311504 . - 0 ID=NBqldanG09240.1.cds11;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100309344 100309410 . - . ID=NBqldanG09240.1.exon16;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100309344 100309410 . - 0 ID=NBqldanG09240.1.cds12;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100309195 100309248 . - . ID=NBqldanG09240.1.exon17;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100309195 100309248 . - 2 ID=NBqldanG09240.1.cds13;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100299937 100300044 . - . ID=NBqldanG09240.1.exon18;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100299937 100300044 . - 2 ID=NBqldanG09240.1.cds14;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100299449 100299513 . - . ID=NBqldanG09240.1.exon19;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100299449 100299513 . - 2 ID=NBqldanG09240.1.cds15;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100299008 100299086 . - . ID=NBqldanG09240.1.exon20;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100299008 100299086 . - 0 ID=NBqldanG09240.1.cds16;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100294564 100294781 . - . ID=NBqldanG09240.1.exon21;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100294564 100294781 . - 2 ID=NBqldanG09240.1.cds17;Parent=NBqldanG09240.1
NBqld01_pan transdecoder exon 100292823 100293257 . - . ID=NBqldanG09240.1.exon22;Parent=NBqldanG09240.1
NBqld01_pan transdecoder CDS 100293069 100293257 . - 0 ID=NBqldanG09240.1.cds18;Parent=NBqldanG09240.1
NBqld01_pan transdecoder three_prime_UTR 100292823 100293068 . - . ID=NBqldanG09240.1.utr3p1;Parent=NBqldanG09240.1
NBqld02_pan transdecoder mRNA 60543002 60546039 . - . ID=NBqldanG09240.1;Note=Histone acetyltransferase HAC1;Parent=NBqldanG09240
NBqld02_pan transdecoder five_prime_UTR 60545846 60546039 . - . ID=NBqldanG09240.1.utr5p1;Parent=NBqldanG09240.1
NBqld02_pan transdecoder five_prime_UTR 60545206 60545357 . - . ID=NBqldanG09240.1.utr5p2;Parent=NBqldanG09240.1
NBqld02_pan transdecoder exon 60545846 60546039 . - . ID=NBqldanG09240.1.exon1;Parent=NBqldanG09240.1
NBqld02_pan transdecoder exon 60545015 60545357 . - . ID=NBqldanG09240.1.exon2;Parent=NBqldanG09240.1
NBqld02_pan transdecoder CDS 60545015 60545205 . - 0 ID=NBqldanG09240.1.cds1;Parent=NBqldanG09240.1
NBqld02_pan transdecoder exon 60544562 60544670 . - . ID=NBqldanG09240.1.exon3;Parent=NBqldanG09240.1
NBqld02_pan transdecoder CDS 60544562 60544670 . - 1 ID=NBqldanG09240.1.cds2;Parent=NBqldanG09240.1
NBqld02_pan transdecoder exon 60543991 60544261 . - . ID=NBqldanG09240.1.exon4;Parent=NBqldanG09240.1
NBqld02_pan transdecoder CDS 60543991 60544261 . - 0 ID=NBqldanG09240.1.cds3;Parent=NBqldanG09240.1
NBqld02_pan transdecoder exon 60543391 60543891 . - . ID=NBqldanG09240.1.exon5;Parent=NBqldanG09240.1
NBqld02_pan transdecoder CDS 60543391 60543891 . - 2 ID=NBqldanG09240.1.cds4;Parent=NBqldanG09240.1
NBqld02_pan transdecoder exon 60543002 60543286 . - . ID=NBqldanG09240.1.exon6;Parent=NBqldanG09240.1
NBqld02_pan transdecoder CDS 60543003 60543286 . - 2 ID=NBqldanG09240.1.cds5;Parent=NBqldanG09240.1
Toward the bottom, there are CDS that are at coordinate 60543003 while the rest are around 100330815, that is 39mbp away
This is outside the boundary of the mRNA parent feature but it tries to draw it anyways resulting in glitches
@cmdcolin Thank you find the problem. @Juke34 Does AGAT has any tools to fix the above the problem?
Thank you in advance,
Michal
The two mRNAs should not be sharing the same parent, they are part of two different loci.
You can try with AGAT (agat_sp_gxf_to_gff3.pl), but I'm not sure how it will behaves. I do not think having implemented any function that try to separate mRNA that are wrongly linked to the same gene (It does for sure the opposite, it can merge mRNA under the same gene when they overlap and the corresponding option is activate).
Is there any gene feature in the GFF file?
Did you created this file with AGAT? Because you can end up with that kind of output where all mRNA are linked to a single gene feature when parsing a file: i) without Id / parent relationship + ii) containing only level3 feature (no gene no mRNA) + iii) without a locus tag (attribute locus_tag or gene_id in the 9th column) that allows to group linked feature together. In such rare case the only way to properly group features together it is to specify to agat_sp_gxf_to_gff3 what attribute to use as locus tag in order to group feature together.
In anyway, you can fix your problem by removing all Parent attributes, and probably the gene(s) feature(s) and run agat_sp_gxf_to_gff3.pl. I will recreate gene features and link mRNAs properly to them and gather mRNAs isoforms under the same gene umbrella only when they overlap.
This is probably not a jbrowse bug so I will close for now, let me know if theres anything else
Hi, In Jbrowse/Apollo when I select my first chromosome all my genes are connected by 1 line as shown below:
However, the chromosome has not that problem:
I have loaded the GFF3 with
perl ~/tomcat7-deploy/apollo/jbrowse/bin/flatfile-to-json.pl --nameAttributes "ID,Name,Note" --gff test-chr1-2.gff3 --compress --trackType HTMLFeatures --trackLabel "Gene Model 1-2 (hybrid)" --type=mRNA --out /efs/apollo/QLD2/ragooQLD500vsNbV1Ch-v2
The test files can be found here .
What could be causing the line to be created?
Thank you in advance.
Michal