Closed mossconfuse closed 1 year ago
What AGAT version are you using? I think it has been fixed in recent release
Hi Juke,
I am using version 1.0.0. agat_sp_filter_feature_from_kill_list.pl is doing the same. It adds new nbis-exons. The output is larger than the input. Any help would be really great. Thanks
Scaffold_1 maker gene 3492427 3498117 . + . ID=XP_08099;Name=XP_08099; Scaffold_1 maker mRNA 3492427 3498117 . + . ID=XP_08099-RA;Parent=XP_08099; Scaffold_1 maker exon 3492427 3492994 . + . ID=nbis-exon-16589;Parent=XP_08099-RA Scaffold_1 maker exon 3493502 3493629 . + . ID=nbis-exon-16590;Parent=XP_08099-RA Scaffold_1 maker exon 3493759 3493830 . + . ID=nbis-exon-16591;Parent=XP_08099-RA Scaffold_1 maker exon 3493903 3494016 . + . ID=nbis-exon-16592;Parent=XP_08099-RA Scaffold_1 maker exon 3494393 3494461 . + . ID=nbis-exon-16593;Parent=XP_08099-RA Scaffold_1 maker exon 3494554 3494610 . + . ID=nbis-exon-16594;Parent=XP_08099-RA Scaffold_1 maker exon 3495358 3495411 . + . ID=nbis-exon-16595;Parent=XP_08099-RA Scaffold_1 maker exon 3495485 3495547 . + . ID=nbis-exon-16596;Parent=XP_08099-RA Scaffold_1 maker exon 3495708 3495770 . + . ID=nbis-exon-16597;Parent=XP_08099-RA Scaffold_1 maker exon 3496121 3496285 . + . ID=nbis-exon-16598;Parent=XP_08099-RA Scaffold_1 maker exon 3496364 3496519 . + . ID=nbis-exon-16599;Parent=XP_08099-RA Scaffold_1 maker exon 3496827 3496907 . + . ID=nbis-exon-16600;Parent=XP_08099-RA Scaffold_1 maker exon 3497023 3497068 . + . ID=XP_08099-RA:17;Parent=XP_08099-RA Scaffold_1 maker exon 3497219 3497322 . + . ID=nbis-exon-16601;Parent=XP_08099-RA Scaffold_1 maker exon 3497648 3498117 . + . ID=nbis-exon-16602;Parent=XP_08099-RA Scaffold_1 maker CDS 3492823 3492994 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3493502 3493629 . + 2 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3493759 3493830 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3493903 3494016 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3494393 3494461 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3494554 3494610 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3495358 3495411 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3495485 3495547 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3495708 3495770 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3496121 3496285 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3496364 3496519 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3496827 3496907 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3497023 3497068 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3497219 3497322 . + 2 ID=XP_08099-RA:cds;Parent=XP_08099-RA Scaffold_1 maker CDS 3497648 3497689 . + 0 ID=XP_08099-RA:cds;Parent=XP_08099-RA
AGAT creates such new exons where information is missing. It deduces exons form CDS and UTRs.
If you don't what this behavior you need to modify the global parameter file config.yaml agar config --expose
and set the check_exons
parameter to false
Hi Juke34,
Thank you for the information. I also see some other changes. Contig_10 maker gene 122303 128021 . + . ID=IDmodified-gene-20274;Name=XP_4916930;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;;copy_num_ID=IDmodified-gene-20274_0 Contig_10 maker mRNA 122303 128021 . + . ID=IDmodified-mrna-20274;Parent=IDmodified-gene-20274;Name=XP_4916930-RA; Contig_10 maker exon 122303 122814 . + . ID=IDmodified-exon-49587;Parent=IDmodified-mrna-20274; Contig_10 maker exon 127931 128021 . + . ID=IDmodified-exon-49588;Parent=IDmodified-mrna-20274; Contig_10 maker CDS 122303 122814 . + . ID=IDmodified-cds-90256;Parent=IDmodified-mrna-20274; Contig_10 maker CDS 127931 128021 . + . ID=IDmodified-cds-90257;Parent=IDmodified-mrna-20274;
How do i get rid of these addition.
Thnaks
I guess this one cannot be avoided, AGAT needs to follow some specifications at some point, and the minimum to deal with GFF/GTF is to have proper relationships between the features. I guess if the IDs seem to be modified it is because it was either missing or wrong (e.g. share between features that are not span over different locations e.g. CDS, or share on features not on same sequences, etc.). To better understand what is going on, you could look at the log or share the the input of that record.
I am trying to generate a gtf file containing nuclear and plastid genomes. The nuclear genomes are available on Phytozome, and
agat_convert_sp_gff2gtf.pl
runs without a hitch on them.The chloroplast (NC_005087) and mitochondrial (KY126309) genomes are only available through ncbi. When I run
agat_convert_sp_gff2gtf.pl
on these, new ID's are generated for many of the features in a format "nbis-gene-xx".These genes already have IDs in the gff file, so I am not sure why this is happening. I tried converting the files to gtf using gffread as well. It works,
I tried to convert them back to gff using agat so I could concatenate them with the phytozome gff and then run
agat_convert_sp_gff2gtf.pl
on them all together, but this reintroduces these "nbis" names. Is there a workaround, or a simple parameter that I need to adjust to fix this?