Closed kashiff007 closed 10 months ago
Hi, Thank you for using AGAT and for your feedback.
Some sorting fix have been added in more recent version. Could you use the latest version v1.2? It might fix the fact that at every trial produce different assigned number. If it is not fixed I will have to investigate the issue more deeply.
The sorting of feature type between input and output (e.g start_codon comes earlier in your input) can differ but the information is actually the same. Without tabix activated, with AGAT the gene comes first (level1) , then comes transcript (level2), and then the sub features are in this order: tss>>exon>>cds>>tts>> any other level3 features in alphabetical order. So start_codon will appear differently than in your input file. If the order of the feature really matters, we could think to update AGAT to be able to pass the order wanted via the config file.
Keep me informed. Best regards
@Juke34 Unfortunately, it's not fixed in the latest version. I was about to open a new issue but found this one instead. The problem I have is not with ordering of features in the output, it's the number suffix getting added that's random.
I'm using the latest version 1.2 with the singularity image. I had a similar problem way back (https://github.com/NBISweden/AGAT/issues/143) but there's no container for that version where you fixed it, 0.7. So, I tried v0.8 and I had the same problem as I do with v1.2.
I can help you narrow down the issue: I've noticed the issue is between chromosomes. Within a chromosome, the sequential numbering is accurate. Here's a reproducible example. In this example, test.gff3 was output from another AGAT script so all the features are already in the expected order.
> cat test.gff3
##gff-version 3
CP060339.1 Liftoff gene 2655 3026 . + . ID=B9J08_004102;Name=hypothetical protein
CP060339.1 Liftoff mRNA 2655 3026 . + . ID=B9J08_004102T0;Parent=B9J08_004102;Name=hypothetical protein
CP060339.1 Liftoff exon 2655 3026 . + 0 ID=B9J08_004102.exon1;Parent=B9J08_004102T0;Name=g3982.t1:CDS:1
CP060339.1 Liftoff CDS 2655 3026 . + 0 ID=cds.B9J08_004102;Parent=B9J08_004102T0;Name=g3982.t1:CDS:1
CP060339.1 Liftoff gene 5717 6577 . - . ID=B9J08_004101;Name=hypothetical protein
CP060339.1 Liftoff mRNA 5717 6577 . - . ID=B9J08_004101T0;Parent=B9J08_004101;Name=hypothetical protein
CP060339.1 Liftoff exon 5717 6577 . - 0 ID=B9J08_004101.exon1;Parent=B9J08_004101T0;Name=g3981.t1:CDS:1
CP060339.1 Liftoff CDS 5717 6577 . - 0 ID=cds.B9J08_004101;Parent=B9J08_004101T0;Name=g3981.t1:CDS:1
CP060339.1 Liftoff gene 7933 10269 . - . ID=B9J08_004100;Name=hypothetical protein
CP060339.1 Liftoff mRNA 7933 10269 . - . ID=B9J08_004100T0;Parent=B9J08_004100;Name=hypothetical protein
CP060339.1 Liftoff exon 7933 10269 . - 0 ID=B9J08_004100.exon1;Parent=B9J08_004100T0;Name=g3980.t1:CDS:1
CP060339.1 Liftoff CDS 7933 10269 . - 0 ID=cds.B9J08_004100;Parent=B9J08_004100T0;Name=g3980.t1:CDS:1
CP060340.1 Liftoff gene 7166 7537 . + . ID=B9J08_001054;Name=hypothetical protein
CP060340.1 Liftoff mRNA 7166 7537 . + . ID=B9J08_001054T0;Parent=B9J08_001054;Name=hypothetical protein
CP060340.1 Liftoff exon 7166 7537 . + 0 ID=B9J08_001054.exon1;Parent=B9J08_001054T0;Name=g1021.t1:CDS:1
CP060340.1 Liftoff CDS 7166 7537 . + 0 ID=cds.B9J08_001054;Parent=B9J08_001054T0;Name=g1021.t1:CDS:1
CP060345.1 Liftoff gene 763950 764423 . + . ID=B9J08_002579;Name=hypothetical protein
CP060341.1 Liftoff gene 5563 7965 . + . ID=B9J08_001529;Name=hypothetical protein
CP060341.1 Liftoff mRNA 5563 7965 . + . ID=B9J08_001529T0;Parent=B9J08_001529;Name=hypothetical protein
CP060341.1 Liftoff exon 5563 7965 . + 0 ID=B9J08_001529.exon1;Parent=B9J08_001529T0;Name=g1476.t1:CDS:1
CP060341.1 Liftoff CDS 5563 7965 . + 0 ID=cds.B9J08_001529;Parent=B9J08_001529T0;Name=g1476.t1:CDS:1
CP060341.1 Liftoff gene 8798 10381 . + . ID=B9J08_001528;Name=hypothetical protein
CP060341.1 Liftoff mRNA 8798 10381 . + . ID=B9J08_001528T0;Parent=B9J08_001528;Name=hypothetical protein
CP060341.1 Liftoff exon 8798 10381 . + 0 ID=B9J08_001528.exon1;Parent=B9J08_001528T0;Name=g1475.t1:CDS:1
CP060341.1 Liftoff CDS 8798 10381 . + 0 ID=cds.B9J08_001528;Parent=B9J08_001528T0;Name=g1475.t1:CDS:1
CP060345.1 Liftoff gene 770324 772909 . + . ID=B9J08_002582;Name=hypothetical protein
CP060345.1 Liftoff mRNA 770324 772909 . + . ID=B9J08_002582T0;Parent=B9J08_002582;Name=hypothetical protein
CP060345.1 Liftoff exon 770324 772909 . + 0 ID=B9J08_002582.exon1;Parent=B9J08_002582T0;Name=g2509.t1:CDS:1
CP060345.1 Liftoff CDS 770324 772909 . + 0 ID=cds.B9J08_002582;Parent=B9J08_002582T0;Name=g2509.t1:CDS:1
CP060345.1 Liftoff gene 774948 776690 . + . ID=B9J08_002583;Name=hypothetical protein
CP060345.1 Liftoff mRNA 774948 776690 . + . ID=B9J08_002583T0;Parent=B9J08_002583;Name=hypothetical protein
CP060345.1 Liftoff exon 774948 776690 . + 0 ID=B9J08_002583.exon1;Parent=B9J08_002583T0;Name=g2510.t1:CDS:1
CP060345.1 Liftoff CDS 774948 776690 . + 0 ID=cds.B9J08_002583;Parent=B9J08_002583T0;Name=g2510.t1:CDS:1
CP060345.1 Liftoff gene 777929 779770 . + . ID=B9J08_002584;Name=hypothetical protein
CP060345.1 Liftoff mRNA 777929 779770 . + . ID=B9J08_002584T0;Parent=B9J08_002584;Name=hypothetical protein
CP060345.1 Liftoff exon 777929 779770 . + 0 ID=B9J08_002584.exon1;Parent=B9J08_002584T0;Name=g2511.t1:CDS:1
CP060345.1 Liftoff CDS 777929 779770 . + 0 ID=cds.B9J08_002584;Parent=B9J08_002584T0;Name=g2511.t1:CDS:1
> agat_sp_manage_IDs.pl --gff test.gff3 --out test.IDs.gff3 --prefix "foobar_" --tair
> cat test.IDs.gff3
##gff-version 3
CP060339.1 Liftoff gene 2655 3026 . + . ID=foobar_5;Name=hypothetical protein
CP060339.1 Liftoff mRNA 2655 3026 . + . ID=foobar_5.1;Parent=foobar_5;Name=hypothetical protein
CP060339.1 Liftoff exon 2655 3026 . + 0 ID=foobar_5.1-exon1;Parent=foobar_5.1;Name=g3982.t1:CDS:1
CP060339.1 Liftoff CDS 2655 3026 . + 0 ID=foobar_5.1-cds2;Parent=foobar_5.1;Name=g3982.t1:CDS:1
CP060339.1 Liftoff gene 5717 6577 . - . ID=foobar_6;Name=hypothetical protein
CP060339.1 Liftoff mRNA 5717 6577 . - . ID=foobar_6.1;Parent=foobar_6;Name=hypothetical protein
CP060339.1 Liftoff exon 5717 6577 . - 0 ID=foobar_6.1-exon1;Parent=foobar_6.1;Name=g3981.t1:CDS:1
CP060339.1 Liftoff CDS 5717 6577 . - 0 ID=foobar_6.1-cds2;Parent=foobar_6.1;Name=g3981.t1:CDS:1
CP060339.1 Liftoff gene 7933 10269 . - . ID=foobar_7;Name=hypothetical protein
CP060339.1 Liftoff mRNA 7933 10269 . - . ID=foobar_7.1;Parent=foobar_7;Name=hypothetical protein
CP060339.1 Liftoff exon 7933 10269 . - 0 ID=foobar_7.1-exon1;Parent=foobar_7.1;Name=g3980.t1:CDS:1
CP060339.1 Liftoff CDS 7933 10269 . - 0 ID=foobar_7.1-cds2;Parent=foobar_7.1;Name=g3980.t1:CDS:1
CP060340.1 Liftoff gene 7166 7537 . + . ID=foobar_1;Name=hypothetical protein
CP060340.1 Liftoff mRNA 7166 7537 . + . ID=foobar_1.1;Parent=foobar_1;Name=hypothetical protein
CP060340.1 Liftoff exon 7166 7537 . + 0 ID=foobar_1.1-exon1;Parent=foobar_1.1;Name=g1021.t1:CDS:1
CP060340.1 Liftoff CDS 7166 7537 . + 0 ID=foobar_1.1-cds2;Parent=foobar_1.1;Name=g1021.t1:CDS:1
CP060341.1 Liftoff gene 5563 7965 . + . ID=foobar_8;Name=hypothetical protein
CP060341.1 Liftoff mRNA 5563 7965 . + . ID=foobar_8.1;Parent=foobar_8;Name=hypothetical protein
CP060341.1 Liftoff exon 5563 7965 . + 0 ID=foobar_8.1-exon1;Parent=foobar_8.1;Name=g1476.t1:CDS:1
CP060341.1 Liftoff CDS 5563 7965 . + 0 ID=foobar_8.1-cds2;Parent=foobar_8.1;Name=g1476.t1:CDS:1
CP060341.1 Liftoff gene 8798 10381 . + . ID=foobar_9;Name=hypothetical protein
CP060341.1 Liftoff mRNA 8798 10381 . + . ID=foobar_9.1;Parent=foobar_9;Name=hypothetical protein
CP060341.1 Liftoff exon 8798 10381 . + 0 ID=foobar_9.1-exon1;Parent=foobar_9.1;Name=g1475.t1:CDS:1
CP060341.1 Liftoff CDS 8798 10381 . + 0 ID=foobar_9.1-cds2;Parent=foobar_9.1;Name=g1475.t1:CDS:1
CP060345.1 Liftoff gene 770324 772909 . + . ID=foobar_2;Name=hypothetical protein
CP060345.1 Liftoff mRNA 770324 772909 . + . ID=foobar_2.1;Parent=foobar_2;Name=hypothetical protein
CP060345.1 Liftoff exon 770324 772909 . + 0 ID=foobar_2.1-exon1;Parent=foobar_2.1;Name=g2509.t1:CDS:1
CP060345.1 Liftoff CDS 770324 772909 . + 0 ID=foobar_2.1-cds2;Parent=foobar_2.1;Name=g2509.t1:CDS:1
CP060345.1 Liftoff gene 774948 776690 . + . ID=foobar_3;Name=hypothetical protein
CP060345.1 Liftoff mRNA 774948 776690 . + . ID=foobar_3.1;Parent=foobar_3;Name=hypothetical protein
CP060345.1 Liftoff exon 774948 776690 . + 0 ID=foobar_3.1-exon1;Parent=foobar_3.1;Name=g2510.t1:CDS:1
CP060345.1 Liftoff CDS 774948 776690 . + 0 ID=foobar_3.1-cds2;Parent=foobar_3.1;Name=g2510.t1:CDS:1
CP060345.1 Liftoff gene 777929 779770 . + . ID=foobar_4;Name=hypothetical protein
CP060345.1 Liftoff mRNA 777929 779770 . + . ID=foobar_4.1;Parent=foobar_4;Name=hypothetical protein
CP060345.1 Liftoff exon 777929 779770 . + 0 ID=foobar_4.1-exon1;Parent=foobar_4.1;Name=g2511.t1:CDS:1
CP060345.1 Liftoff CDS 777929 779770 . + 0 ID=foobar_4.1-cds2;Parent=foobar_4.1;Name=g2511.t1:CDS:1
Right I guess there is a de-sync of the way AGAT parse the Dictionary to set new IDs and the way it parse the Dictionary to print the output. I will push a fix. Thank you for the feedback.
My gff file looks like this:
I want to add suffix before gene names and keep the last part as original with agat_sp_manage_IDs.pl (Version: v0.8.0). I have used following command
agat_sp_manage_IDs.pl -f A.gff --prefix Dis.W6-48549-006.v1.___ --tair --type_dependent -o A_new_rename.gff --ensembl
There are two major problems occurring while performing this:I tried without --tair option too; producing same error.
The output looks like:
Could you suggest the possible reason for this?
My expected outcome should looks like: