NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
465 stars 56 forks source link

agat_sp_keep_longest issue in adding exon feature #347

Closed jeankeller closed 1 year ago

jeankeller commented 1 year ago

Hi,

I am using AGAT to extract longest isoforms and during the first part when the script adds missing feature, I spotted a weird behavior. When AGAT adds exon feature it seems working well expect for the first exon which correspond to the start_codon feature of the original gff instead of the first CDS (see below). I am using AGAT v1.0.0 on WSL2 Ubuntu 22-04, built from source.

Thank you

Original GFF3

contig_1153_segment0 gmst mRNA 71352 75749 . + . ID=g100.t2;Parent=g100; contig_1153_segment0 gmst start_codon 71352 71354 338.821349 + 0 ID=g100.t2.start1;Parent=g100.t2; contig_1153_segment0 gmst CDS 71352 71486 338.821349 + 0 ID=g100.t2.CDS1;Parent=g100.t2; contig_1153_segment0 gmst intron 71487 71637 338.821349 + 0 ID=g100.t2.intron1;Parent=g100.t2; contig_1153_segment0 gmst CDS 71638 73086 338.821349 + 0 ID=g100.t2.CDS2;Parent=g100.t2; contig_1153_segment0 gmst intron 73087 73302 338.821349 + 0 ID=g100.t2.intron2;Parent=g100.t2; contig_1153_segment0 gmst CDS 73303 74593 338.821349 + 0 ID=g100.t2.CDS3;Parent=g100.t2; contig_1153_segment0 gmst intron 74594 74802 338.821349 + 0 ID=g100.t2.intron3;Parent=g100.t2; contig_1153_segment0 gmst CDS 74803 75263 338.821349 + 2 ID=g100.t2.CDS4;Parent=g100.t2; contig_1153_segment0 gmst intron 75264 75317 338.821349 + 0 ID=g100.t2.intron4;Parent=g100.t2; contig_1153_segment0 gmst CDS 75318 75749 338.821349 + 0 ID=g100.t2.CDS5;Parent=g100.t2; contig_1153_segment0 gmst stop_codon 75747 75749 338.821349 + 0 ID=g100.t2.stop1;Parent=g100.t2;

AGAT produced GFF3

contig_1153_segment0 gmst mRNA 71352 75749 . + . ID=g100.t2;Parent=g100 contig_1153_segment0 gmst exon 71352 71354 338.821349 + . ID=nbis-exon-3;Parent=g100.t2 contig_1153_segment0 gmst exon 71638 73086 338.821349 + . ID=nbis-exon-4;Parent=g100.t2 contig_1153_segment0 gmst exon 73303 74593 338.821349 + . ID=nbis-exon-5;Parent=g100.t2 contig_1153_segment0 gmst exon 74803 75263 338.821349 + . ID=nbis-exon-6;Parent=g100.t2 contig_1153_segment0 gmst exon 75318 75749 338.821349 + . ID=nbis-exon-7;Parent=g100.t2 contig_1153_segment0 gmst CDS 71352 71486 338.821349 + 0 ID=g100.t2.CDS1;Parent=g100.t2 contig_1153_segment0 gmst CDS 71638 73086 338.821349 + 0 ID=g100.t2.CDS2;Parent=g100.t2 contig_1153_segment0 gmst CDS 73303 74593 338.821349 + 0 ID=g100.t2.CDS3;Parent=g100.t2 contig_1153_segment0 gmst CDS 74803 75263 338.821349 + 2 ID=g100.t2.CDS4;Parent=g100.t2 contig_1153_segment0 gmst CDS 75318 75749 338.821349 + 0 ID=g100.t2.CDS5;Parent=g100.t2 contig_1153_segment0 gmst intron 71487 71637 338.821349 + 0 ID=g100.t2.intron1;Parent=g100.t2 contig_1153_segment0 gmst intron 73087 73302 338.821349 + 0 ID=g100.t2.intron2;Parent=g100.t2 contig_1153_segment0 gmst intron 74594 74802 338.821349 + 0 ID=g100.t2.intron3;Parent=g100.t2 contig_1153_segment0 gmst intron 75264 75317 338.821349 + 0 ID=g100.t2.intron4;Parent=g100.t2 contig_1153_segment0 gmst start_codon 71352 71354 338.821349 + 0 ID=g100.t2.start1;Parent=g100.t2 contig_1153_segment0 gmst stop_codon 75747 75749 338.821349 + 0 ID=g100.t2.stop1;Parent=g100.t2

Juke34 commented 1 year ago

Hi @jeankeller, Good catch! There is indeed a bug in certain conditions when recreating exon. It will be fix in the master branch and included in the next release.

jeankeller commented 1 year ago

Thank you for the rapid fix! now, it works as intended.