NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
432 stars 52 forks source link

Problem with GFF to GTF conversion #417

Closed PavelKiryanov closed 4 months ago

PavelKiryanov commented 5 months ago

Hi, I need to convert GFF to GTF file from here: https://genomevolution.org/CoGe/GenomeInfo.pl?gid=35080 After running agat_convert_sp_gff2gtf.pl --gff myfile.gff -o outmyfile.gtf I get a lot of errors with the following entry:

------ Start parsing ------                           
-------------------------- parse options and metadata --------------------------
=> Accessing the feature_levels YAML file
Using standard /home/pavel/miniconda3/envs/practice/lib/perl5/site_perl/auto/share/dist/AGAT/feature_levels.yaml file
=> Attribute used to group features when no Parent/ID relationship exists (i.e common tag):
    * locus_tag
    * gene_id
=> merge_loci option deactivated
=> Machine information:
    This script is being run by perl v5.32.1
    Bioperl location being used: /home/pavel/miniconda3/envs/practice/lib/perl5/site_perl/Bio/
    Operating system being used: linux 
=> Accessing Ontology
    No ontology accessible from the gff file header!
    We use the SOFA ontology distributed with AGAT:
        /home/pavel/miniconda3/envs/practice/lib/perl5/site_perl/auto/share/dist/AGAT/so.obo
    Read ontology /home/pavel/miniconda3/envs/practice/lib/perl5/site_perl/auto/share/dist/AGAT/so.obo:
        4 root terms, and 2596 total terms, and 1516 leaf terms
    Filtering ontology:
        We found 1861 terms that are sequence_feature or is_a child of it.
--------------------------------- parsing file ---------------------------------
=> Number of line in file: 391032
=> Number of comment lines: 8929
=> Fasta included: No
=> Number of features lines: 382103
=> Number of feature type (3rd column): 11
    * Level1: 2 => gene chromosome
    * level2: 1 => mRNA
    * level3: 5 => non_canonical_five_prime_splice_site non_canonical_three_prime_splice_site CDS stop_codon_read_through exon
    * unknown: 3 => substitution deletion insertion
=> Version of the Bioperl GFF parser selected by AGAT: 3
gff3 reader warning: primary_tag error @ deletion still not taken into account! Please modify the feature_levels YAML file to define the feature in one of the levels.
WARNING level3: No Parent attribute found @ for the feature: Contig1053 CoGe   CDS  2832    2933    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
Contig1053  CoGe    CDS 2832    2933    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING level3: No Parent attribute found @ for the feature: Contig1053 CoGe   CDS  2957    3080    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS2"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
Contig1053  CoGe    CDS 2957    3080    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS2"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING level3: No Parent attribute found @ for the feature: Contig1053 CoGe   CDS  3371    3828    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS3"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
Contig1053  CoGe    CDS 3371    3828    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS3"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING level3: No Parent attribute found @ for the feature: Contig1053 CoGe   CDS  7022    7151    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS4"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
Contig1053  CoGe    CDS 7022    7151    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS4"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING level3: No Parent attribute found @ for the feature: Contig1053 CoGe   CDS  7225    7424    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS5"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
Contig1053  CoGe    CDS 7225    7424    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS5"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING level3: No Parent attribute found @ for the feature: Contig1053 CoGe   CDS  7673    7887    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS6"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
Contig1053  CoGe    CDS 7673    7887    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS6"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING level3: No Parent attribute found @ for the feature: Contig1053 CoGe   CDS  7990    8212    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS7"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
Contig1053  CoGe    CDS 7990    8212    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS7"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING level3: No Parent attribute found @ for the feature: Contig1053 CoGe   CDS  8413    8563    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS8"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
Contig1053  CoGe    CDS 8413    8563    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS8"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING level3: No Parent attribute found @ for the feature: Contig1053 CoGe   CDS  8663    8971    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS9"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
Contig1053  CoGe    CDS 8663    8971    .   +   .   Alias "Bpev01.c1053.g0001.m0001"  ; CDS "Bpev01.c1053.g0001.m0001"  ; ID "Bpev01.c1053.g0001.m0001.CDS9"  ; Name "Bpev01.c1053.g0001.m0001"  ; coge_fid 1260978662
WARNING level3: No Parent attribute found @ for the feature: Contig1072 CoGe   CDS  24521   24861   .   +   .   Alias "Bpev01.c1072.g0004.m0002"  ; CDS "Bpev01.c1072.g0004.m0002"  ; ID "Bpev01.c1072.g0004.m0002"  ; Name "Bpev01.c1072.g0004.m0002"  ; coge_fid 1260979295
WARNING level3: No Parent attribute found  ************** Too much WARNING message we skip the next **************
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
Contig1072  CoGe    CDS 24521   24861   .   +   .   Alias "Bpev01.c1072.g0004.m0002"  ; CDS "Bpev01.c1072.g0004.m0002"  ; ID "Bpev01.c1072.g0004.m0002"  ; Name "Bpev01.c1072.g0004.m0002"  ; coge_fid 1260979295
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus.  ************** Too much WARNING message we skip the next **************
WARNING l2 and l1 features not on same seq_id @ nbis-mrna-7 level2 feature is on Contig1079 sequence while Ankyrin_repeat_family_protein_LENGTH_598.gene1 level1 feature is on Contig1050
gff3 reader warning: primary_tag error @ deletion still not taken into account! Please modify the feature_levels YAML file to define the feature in one of the levels.
WARNING l2 and l1 features not on same seq_id @ ASCORBATE_Peroxidase.mRNA2 level2 feature is on Contig1089 sequence while ASCORBATE_Peroxidase level1 feature is on Contig1036
WARNING l2 and l1 features not on same seq_id @ nbis-mrna-13 level2 feature is on Contig1090 sequence while Ankyrin_repeat_family_protein_LENGTH_598.gene1 level1 feature is on Contig1050
WARNING l2 and l1 features not on same seq_id @ nbis-mrna-15 level2 feature is on Contig11 sequence while Ankyrin_repeat_family_protein_LENGTH_282.gene2 level1 feature is on Contig1007
WARNING l2 and l1 features not on same seq_id @ nbis-mrna-19 level2 feature is on Contig1105 sequence while Ankyrin_repeat_family_protein_LENGTH_598.gene1 level1 feature is on Contig1050
WARNING l2 and l1 features not on same seq_id @ nbis-mrna-24 level2 feature is on Contig1119 sequence while Ankyrin_repeat_family_protein_LENGTH_598.gene1 level1 feature is on Contig1050
WARNING l2 and l1 features not on same seq_id @ nbis-mrna-25 level2 feature is on Contig112 sequence while ABC_transporter_family_C_member.gene1 level1 feature is on Contig1043
WARNING l2 and l1 features not on same seq_id @ nbis-mrna-26 level2 feature is on Contig112 sequence while ABC_transporter_family_C_member.gene2 level1 feature is on Contig1043
WARNING l2 and l1 features not on same seq_id @ nbis-mrna-27 level2 feature is on Contig1120 sequence while ATP-dependent_zinc_metalloprotease_FtsH.gene1 level1 feature is on Contig106
gff3 reader warning: primary_tag error @ deletion still not taken into account! Please modify the feature_levels YAML file to define the feature in one of the levels.
WARNING l2 and l1 features not on same seq_id @ nbis-mrna-28 level2 feature is on Contig1148 sequence while ABC_transporter_family_G_member.gene1 level1 feature is on Contig1047
WARNING l2 and l1 features not on same seq_id  ************** Too much WARNING message we skip the next **************
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
 @ the feature is:
Contig1238  CoGe    gene    136421  140294  .   -   .   Alias "3-galactosyltransferase_15"  "Beta-1"  "Bpev01.c1238.g0015"  ; ID "nbis-gene-28"  ; Name "3-galactosyltransferase_15"  ; coge_fid 1260985376 ; encoded_feature mRNA ; gene "3-galactosyltransferase_15" 
original id: 3-galactosyltransferase_15.gene1
gff3 reader warning: primary_tag error @ substitution still not taken into account! Please modify the feature_levels YAML file to define the feature in one of the levels.
WARNING level2: No Parent attribute found @ for the feature: Contig134  CoGe   mRNA 882980  883372  .   -   .   Alias 1 "4-alpha-glucan_branching_enzyme_GlgB"  "Bpev01.c0134.g0101"  "Bpev01.c0134.g0101.m0002"  ; ID "1.mRNA2"  ; Name 1 ; coge_fid 1260988777 ; mRNA 1
WARNING level2: No Parent attribute found @ for the feature: Contig134  CoGe   mRNA 882980  883507  .   -   .   Alias 1 "4-alpha-glucan_branching_enzyme_GlgB"  "Bpev01.c0134.g0101"  "Bpev01.c0134.g0101.m0002"  ; ID "1.mRNA3"  ; Name 1 ; coge_fid 1260988777 ; mRNA 1
WARNING level2: No Parent attribute found @ for the feature: Contig134  CoGe   mRNA 883798  883860  .   -   .   Alias 1 "4-alpha-glucan_branching_enzyme_GlgB"  "Bpev01.c0134.g0101"  "Bpev01.c0134.g0101.m0002"  ; ID "1.mRNA4"  ; Name 1 ; coge_fid 1260988777 ; mRNA 1
WARNING level2: No Parent attribute found @ for the feature: Contig134  CoGe   mRNA 884195  884311  .   -   .   Alias 1 "4-alpha-glucan_branching_enzyme_GlgB"  "Bpev01.c0134.g0101"  "Bpev01.c0134.g0101.m0002"  ; ID "1.mRNA5"  ; Name 1 ; coge_fid 1260988777 ; mRNA 1
WARNING level2: No Parent attribute found @ for the feature: Contig134  CoGe   mRNA 884606  885512  .   -   .   Alias 1 "4-alpha-glucan_branching_enzyme_GlgB"  "Bpev01.c0134.g0101"  "Bpev01.c0134.g0101.m0002"  ; ID "1.mRNA6"  ; Name 1 ; coge_fid 1260988777 ; mRNA 1
WARNING level2: No Parent attribute found @ for the feature: Contig134  CoGe   mRNA 885828  886097  .   -   .   Alias 1 "4-alpha-glucan_branching_enzyme_GlgB"  "Bpev01.c0134.g0101"  "Bpev01.c0134.g0101.m0002"  ; ID "1.mRNA7"  ; Name 1 ; coge_fid 1260988777 ; mRNA 1
WARNING level2: No Parent attribute found @ for the feature: Contig134  CoGe   mRNA 886460  886529  .   -   .   Alias 1 "4-alpha-glucan_branching_enzyme_GlgB"  "Bpev01.c0134.g0101"  "Bpev01.c0134.g0101.m0002"  ; ID "1.mRNA8"  ; Name 1 ; coge_fid 1260988777 ; mRNA 1
WARNING level2: No Parent attribute found @ for the feature: Contig134  CoGe   mRNA 886939  887149  .   -   .   Alias 1 "4-alpha-glucan_branching_enzyme_GlgB"  "Bpev01.c0134.g0101"  "Bpev01.c0134.g0101.m0002"  ; ID "1.mRNA9"  ; Name 1 ; coge_fid 1260988777 ; mRNA 1
WARNING level2: No Parent attribute found @ for the feature: Contig134  CoGe   mRNA 887348  887467  .   -   .   Alias 1 "4-alpha-glucan_branching_enzyme_GlgB"  "Bpev01.c0134.g0101"  "Bpev01.c0134.g0101.m0002"  ; ID "1.mRNA10"  ; Name 1 ; coge_fid 1260988777 ; mRNA 1
WARNING level2: No Parent attribute found @ for the feature: Contig134  CoGe   mRNA 888456  888779  .   -   .   Alias 1 "4-alpha-glucan_branching_enzyme_GlgB"  "Bpev01.c0134.g0101"  "Bpev01.c0134.g0101.m0002"  ; ID "1.mRNA11"  ; Name 1 ; coge_fid 1260988777 ; mRNA 1
WARNING level2: No Parent attribute found  ************** Too much WARNING message we skip the next **************
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
 @ the feature is:
Contig157   CoGe    gene    219953  227619  .   +   .   Alias 4 "5-trisphosphate_3-phosphatase_and_dual-specificity_protein_phosphatase_PTEN"  "Bpev01.c0157.g0012"  Phosphatidylinositol_3 ; ID "nbis-gene-86"  ; Name 4 ; coge_fid 1260996251 ; encoded_feature mRNA ; gene 4
original id: 4.gene1
gff3 reader warning: primary_tag error @ insertion still not taken into account! Please modify the feature_levels YAML file to define the feature in one of the levels.
gff3 reader warning: primary_tag error @ deletion still not taken into account! Please modify the feature_levels YAML file to define the feature in one of the levels.
gff3 reader warning: primary_tag error @ deletion still not taken into account! Please modify the feature_levels YAML file to define the feature in one of the levels.
gff3 reader warning: primary_tag error @ deletion still not taken into account! Please modify the feature_levels YAML file to define the feature in one of the levels.
gff3 reader warning: primary_tag error @ deletion still not taken into account! Please modify the feature_levels YAML file to define the feature in one of the levels.
gff3 reader warning: primary_tag error @ deletion still not taken into account! Please modify the feature_levels YAML file to define the feature in one of the levels.
gff3 reader warning: primary_tag error  ************** Too much WARNING message we skip the next **************
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
 @ the feature is:
Contig1769  CoGe    gene    60375   64103   .   -   .   Alias "1-aminocyclopropane-1-carboxylate_synthase"  "Bpev01.c1769.g0006"  ; ID "nbis-gene-120"  ; Name "1-aminocyclopropane-1-carboxylate_synthase"  ; coge_fid 1261002020 ; encoded_feature mRNA ; gene "1-aminocyclopropane-1-carboxylate_synthase" 
original id: 1-aminocyclopropane-1-carboxylate_synthase.gene1
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
 @ the feature is:
Contig18    CoGe    gene    1000925 1003071 .   -   .   Alias "1-aminocyclopropane-1-carboxylate_synthase"  "Bpev01.c0018.g0086"  ; ID "nbis-gene-123"  ; Name "1-aminocyclopropane-1-carboxylate_synthase"  ; coge_fid 1261002890 ; encoded_feature mRNA ; gene "1-aminocyclopropane-1-carboxylate_synthase" 
original id: 1-aminocyclopropane-1-carboxylate_synthase.gene1
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
 @ the feature is:
Contig195   CoGe    gene    208458  211778  .   +   .   Alias "6-phosphogluconate_dehydrogenase"  "Bpev01.c0195.g0017"  _decarboxylating_3 ; ID "nbis-gene-150"  ; Name "6-phosphogluconate_dehydrogenase"  ; coge_fid 1261005929 ; encoded_feature mRNA ; gene "6-phosphogluconate_dehydrogenase" 
original id: 6-phosphogluconate_dehydrogenase.gene1
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
 @ the feature is:
Contig2405  CoGe    gene    29929   32455   .   +   .   Alias "3-beta-glucosidase"  "Bpev01.c2405.g0004"  "Glucan_endo-1"  ; ID "nbis-gene-208"  ; Name "3-beta-glucosidase"  ; coge_fid 1261013360 ; encoded_feature mRNA ; gene "3-beta-glucosidase" 
original id: 3-beta-glucosidase.gene1
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
 @ the feature is:
Contig2405  CoGe    gene    33042   33392   .   +   .   Alias "3-beta-glucosidase"  "Bpev01.c2405.g0005"  "Glucan_endo-1"  ; ID "nbis-gene-209"  ; Name "3-beta-glucosidase"  ; coge_fid 1261013363 ; encoded_feature mRNA ; gene "3-beta-glucosidase" 
original id: 3-beta-glucosidase.gene2
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
 @ the feature is:
Contig2405  CoGe    gene    37464   37687   .   -   .   Alias "3-beta-glucosidase"  "Bpev01.c2405.g0007"  "Glucan_endo-1"  ; ID "nbis-gene-210"  ; Name "3-beta-glucosidase"  ; coge_fid 1261013369 ; encoded_feature mRNA ; gene "3-beta-glucosidase" 
original id: 3-beta-glucosidase.gene3
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
 @ the feature is:
Contig241   CoGe    gene    84118   90664   .   -   .   Alias "3'(2')"  "5'-bisphosphate_nucleotidase"  "Bpev01.c0241.g0008"  ; ID "nbis-gene-211"  ; Name "3'(2')"  ; coge_fid 1261013412 ; encoded_feature mRNA ; gene "3'(2')" 
original id: 3'(2').gene1
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
 @ the feature is:
Contig261   CoGe    gene    396461  401423  .   -   .   Alias "8-amino-7-oxononanoate_synthase"  "Bpev01.c0261.g0045"  ; ID "nbis-gene-231"  ; Name "8-amino-7-oxononanoate_synthase"  ; coge_fid 1261015798 ; encoded_feature mRNA ; gene "8-amino-7-oxononanoate_synthase" 
original id: 8-amino-7-oxononanoate_synthase.gene1
WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.
  ************** Too much WARNING message we skip the next **************
376 warning messages: WARNING level3: No Parent attribute found 
859 warning messages: WARNING l2 and l1 features not on same seq_id 
49 warning messages: gff3 reader warning: primary_tag error 
61 warning messages: WARNING level1: This feature level1 is not a duplicate but has an ID already used.
/!\ AGAT might mix up the child features and create chimeric records.
Indeed we changed the ID for this L1 feature to be unique but we do not 
change the Parent attribute of the child features to reflect this change.
Why? because we do not know to which L1 the child feature was part-of because several Parent have similar ID.

387 warning messages: WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. 
11 warning messages: WARNING level2: No Parent attribute found 
                  ------ End parsing (done in 50 second) ------                 

                           ------ Start checks ------                           
---------------------------- Check1: feature types -----------------------------
----------------------------------- ontology -----------------------------------
All feature types in agreement with the Ontology.
------------------------------------- agat -------------------------------------
WARNING - Feature types not expected by AGAT:
* deletion
* insertion
* substitution
The feature of these types (3rd column in GFF3) are skipped by the parser!
To take them into account you must update the feature_levels YAML file. To acces
s this file run:
            agat levels --expose
In which file to add my feature?
* Feature level1 (e.g. gene, match, region):
  My feature has no parent
  => level1 section.
* Feature level2 (e.g. mrna, match_part, trna):
  My feature has one parent and children
  => level2 section.
* Feature level3 (e.g. exon, intron, cds):
  My feature has one parent (the parent has also a parent) and no children
  => level3 section.
* Feature level3 discontinuous (e.g. cds, utr):
  A single feature that exists over multiple genomic locations
  => spread section.
------------------------------ done in 0 seconds -------------------------------

------------------------------ Check2: duplicates ------------------------------
None found
------------------------------ done in 0 seconds -------------------------------

-------------------------- Check3: sequential bucket ---------------------------
We found 376 level3 sequential cases.
------------------------------ done in 0 seconds -------------------------------

--------------------------- Check4: l2 linked to l3 ----------------------------
No problem found
------------------------------ done in 0 seconds -------------------------------

--------------------------- Check5: l1 linked to l2 ----------------------------
No problem found
------------------------------ done in 0 seconds -------------------------------

--------------------------- Check6: remove orphan l1 ---------------------------
We remove only those not supposed to be orphan
801 cases removed where L1 features do not have children (while they are suposed to have children).
------------------------------ done in 1 seconds -------------------------------

------------------------- Check7: all level3 locations -------------------------
------------------------------ done in 9 seconds -------------------------------

------------------------------ Check8: check cds -------------------------------
No problem found
------------------------------ done in 0 seconds -------------------------------

----------------------------- Check9: check exons ------------------------------
265 exons created that were missing
28 exons locations modified that were wrong
No supernumerary exons removed
653 level2 locations modified
------------------------------ done in 6 seconds -------------------------------

----------------------------- Check10: check utrs ------------------------------
29499 UTRs created that were missing
No UTRs locations modified
No supernumerary UTRs removed
------------------------------ done in 6 seconds -------------------------------

------------------------ Check11: all level2 locations -------------------------
We fixed 2 wrong level2 location cases
------------------------------ done in 5 seconds -------------------------------

------------------------ Check12: all level1 locations -------------------------
We fixed 601 wrong level1 location cases
------------------------------ done in 1 seconds -------------------------------

---------------------- Check13: remove identical isoforms ----------------------
Lets remove isoform Bpev01.c0027.g0042.mRNA2
Lets remove isoform BROTHER_OF_FT_AND_TFL1.mRNA2
Lets remove isoform Bpev01.c0170.g0057.mRNA2
Lets remove isoform Bpev01.c0887.g0017.mRNA2
Lets remove isoform Bpev01.c0866.g0021.mRNA2
Lets remove isoform Bpev01.c0191.g0028.mRNA2
Lets remove isoform Bpev01.c1238.g0013.mRNA3
Lets remove isoform Bpev01.c1238.g0013.mRNA2
8 identical isoforms removed
------------------------------ done in 0 seconds -------------------------------
                  ------ End checks (done in 28 second) ------  

Can you tell why this is happening? Perhaps I'm not creating the GFF file correctly in this menu?

Снимок экрана от 2024-01-26 01-14-08 Is it possible to somehow correct a series of errors? Thank you for your work! I really hope you can help!

Juke34 commented 5 months ago

Right your file is poorly formatted you have gene that have the same ID on different chromosome, this is not good. (e.g. 1-aminocyclopropane-1-carboxylate_synthase.gene1) I will come back to you with a suggested solution

Juke34 commented 5 months ago

Split your file by sequence ID (e.g. awk '{ file=$1; if( $0 !~ /^#/)print $0 > file}' infile.gff) Then process each file with agat_sp_manage_IDs.pl to provide new IDs (using e.g. the sequence ID as prefix) Then you can merge all file together with a simple cat command

PavelKiryanov commented 4 months ago

Разделите файл по идентификатору последовательности (например. awk '{ file=$1; if( $0 !~ /^#/)print $0 > file}' infile.gff) (переходы) Затем обрабатывайте каждый файл с помощью agat_sp_manage_IDs.plдля предоставления новых идентификаторов (например, идентификатор последовательности в качестве префикса) Затем вы можете объединить все файлы вместе с простым catКомандная команда

I've split all the files, as you said. Now I need to do agat_sp_manage_IDs.pl . I launched it like this agat_sp_manage_IDs.pl --gff Bpe_Chr1 --prefix [ -o Bpe_Chr1out.gf ]. Is that right? I ended up with the same errors. Sorry, I'm new to bioinformatics...

Juke34 commented 4 months ago

Yes run agat_sp_manage_IDs.pl --gff Bpe_Chr1 --prefix chr1 -o Bpe_Chr1out.gff for chr1 agat_sp_manage_IDs.pl --gff Bpe_Chr2 --prefix chr2 -o Bpe_Chr2out.gff for chr2 etc...

Then you concatenate all the out files together and it should be fine. To check you can run agat_sq_stat_basic.pl on your original file and on your resulting file. You should get similar values.

PavelKiryanov commented 4 months ago

After all this, I saw this. Could these errors be due to the fact that I have 14 chromosomes and when moving from number 1 to number 11 they repeated? Is it normal for these files to appear?

Parent "nbis-gene-5223" ; coge_fid 1235352966 ; exon "Bpev01.c0717.g0019.exon1" 2562 cases fixed where L3 features have parent feature(s) missing ------------------------------ done in 0 seconds -------------------------------

--------------------------- Check5: l1 linked to l2 ---------------------------- 2700 cases fixed where L2 features have parent features missing ------------------------------ done in 1 seconds -------------------------------

--------------------------- Check6: remove orphan l1 --------------------------- We remove only those not supposed to be orphan 2661 cases removed where L1 features do not have children (while they are suposed to have children). ------------------------------ done in 0 seconds -------------------------------

------------------------- Check7: all level3 locations ------------------------- ------------------------------ done in 7 seconds -------------------------------

------------------------------ Check8: check cds ------------------------------- No problem found ------------------------------ done in 0 seconds -------------------------------

----------------------------- Check9: check exons ------------------------------ No exons created No exons locations modified No supernumerary exons removed 183 level2 locations modified ------------------------------ done in 4 seconds -------------------------------

----------------------------- Check10: check utrs ------------------------------ 49 UTRs created that were missing No UTRs locations modified No supernumerary UTRs removed ------------------------------ done in 3 seconds -------------------------------

------------------------ Check11: all level2 locations ------------------------- No problem found ------------------------------ done in 4 seconds -------------------------------

------------------------ Check12: all level1 locations ------------------------- We fixed 387 wrong level1 location cases ------------------------------ done in 1 seconds -------------------------------

---------------------- Check13: remove identical isoforms ---------------------- Lets remove isoform nbis-mrna-1490 Lets remove isoform nbis-mrna-17 Lets remove isoform nbis-mrna-1860 Lets remove isoform nbis-mrna-2169 Lets remove isoform nbis-mrna-2517 Lets remove isoform nbis-mrna-1009 Lets remove isoform nbis-mrna-864 Lets remove isoform nbis-mrna-1048 Lets remove isoform nbis-mrna-2015 Lets remove isoform nbis-mrna-2075 Lets remove isoform nbis-mrna-1428 Lets remove isoform nbis-mrna-255 Lets remove isoform nbis-mrna-525 Lets remove isoform nbis-mrna-1588 Lets remove isoform nbis-mrna-960 Lets remove isoform nbis-mrna-1342 Lets remove isoform nbis-mrna-1547 Lets remove isoform nbis-mrna-1326 Lets remove isoform nbis-mrna-872 Lets remove isoform nbis-mrna-2553 Lets remove isoform nbis-mrna-2189 Lets remove isoform nbis-mrna-1864 Lets remove isoform nbis-mrna-1070 Lets remove isoform nbis-mrna-2419 Lets remove isoform nbis-mrna-884 Lets remove isoform nbis-mrna-1050 Lets remove isoform nbis-mrna-1046 Lets remove isoform nbis-mrna-1747 Lets remove isoform nbis-mrna-2390 Lets remove isoform nbis-mrna-165 Lets remove isoform nbis-mrna-1753 Lets remove isoform nbis-mrna-2492 Lets remove isoform nbis-mrna-30 33 identical isoforms removed ------------------------------ done in 0 seconds ------------------------------- ------ End checks (done in 20 second) ----

PavelKiryanov commented 4 months ago

my original file information

Type (3rd column) Number Size total (kb) Size mean (bp) /!\Results are rounding to two decimal places cds 129416 30405.40 234.94 exon 136591 37081.36 271.48 five_prime_utr 14165 2100.04 148.26 gene 24698 303892.67 12304.34 mrna 25768 152394.22 5914.09 three_prime_utr 13743 3908.60 284.41 Total 344381 529782.29 1538.36

and my resulting file

Type (3rd column) Number Size total (kb) Size mean (bp) /!\Results are rounding to two decimal places

cds 129775 30472.42 234.81 exon 154059 40794.15 264.80 gene 24625 100509.67 4081.61 mrna 25775 125992.66 4888.17 Total 334234 297768.91 890.90

Apparently the quantity has changed. This is bad?

Juke34 commented 4 months ago

I guess you mixed up original file and resulting file, because there was no UTR in your original file, and AGAT added them. Anyway I would say that the result sounds good now excepted you loosed few specific features e.g. non_canonical_five_prime_splice_site, stop_codon_read_through, etc. I do not know if you really need them but if you want to keep the you should follow those isntruction: https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account

You may also write to CoGe to tell them it is not normal that Unique identifier are not unique and use multiple time (for each chromosome the ID are reset, and re-used)

PavelKiryanov commented 4 months ago

I solved my problem with this file. Thank you very much! And thank you for teaching me bioinformatics!