NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
467 stars 56 forks source link

agat convert gff to gtf produces duplicate transcript ID #331

Closed ericmalekos closed 1 year ago

ericmalekos commented 1 year ago

Describe the bug agat_convert_sp_gff2gtf produced a gtf with duplicate transcript IDs. But the GFF seems be correct.

Single example:

agat_sp_merge_annotations.pl --gff gencode.vM31.comprehensive.annotation.gtf --gff cufflinks.gtf --out AgatMerge.gff

  grep "TCONS_00000008" AgatMerge.gff | awk '$3 == "RNA"' | cut -d";" -f1,2,4
  chr1  Cufflinks   RNA 4567674 4569781 .   +   .   **ID=TCONS_00000008**;Parent=XLOC_000006;exon_number=1
  chr1  Cufflinks   RNA 4567697 4569781 .   +   .   **ID=TCONS_00000009**;Parent=XLOC_000006;exon_number=1

agat_convert_sp_gff2gtf.pl --gff AgatMerge.gff -o AgatMerge.gtf

    grep "TCONS_00000008" AgatMerge.gtf | awk '$3 == "transcript"' | cut -d";" -f1,2
    chr1    Cufflinks   transcript  4567674 4569781 .   +   .   gene_id "XLOC_000006"; **transcript_id "TCONS_00000008"**
    chr1    Cufflinks   transcript  4567697 4569781 .   +   .   gene_id "XLOC_000006"; **transcript_id "TCONS_00000008"**

GTF/GFF wide occurrences:

awk '$3 == "transcript"' AgatMerge.gtf | cut -d";" -f2 | sort | uniq -d | wc -l
7166

awk '$3 == "RNA"' AgatMerge.gff | cut -d";" -f1 | cut -d"=" -f2 | sort | uniq -d | wc -l
0

AGAT v1.0.0, bioconda

Juke34 commented 1 year ago

Hi, thank you fir using AGAT. Could you share a data sample to test?

Juke34 commented 1 year ago

I would need more context to understand what is going on (lines around the features you show e.g. CDS/exon/gene/etc )

Juke34 commented 1 year ago

It might be fixed in updated version of AGAT