NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
432 stars 52 forks source link

agat_sp_manage_functional_annotation.pl empty output #390

Closed isnmn closed 8 months ago

isnmn commented 11 months ago
**$ cat report.txt** 
_usage: /export/home4/2762262d/perl5/bin/agat_sp_manage_functional_annotation.pl -f RENAMED_NoSeq.gff -b MM10db_Prot_noLabel.blastp --output P_AGAT --db ./BLAST_DB/mm10_uniprot_canonical.fasta -i RENAMED_Proteins.fasta.tsv_
**$ cat RENAMED_NoSeq.agat.log **
                            08/07/2023 at 12h00m29s                             

 ------------------------------------------------------------------------------
|   Another GFF Analysis Toolkit (AGAT) - Version: v1.2.0                      |
|   https://github.com/NBISweden/AGAT                                          |
|   National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se         |
 ------------------------------------------------------------------------------

                          ------ Start parsing ------                           
-------------------------- parse options and metadata --------------------------
=> Accessing the feature_levels YAML file
Using standard /export/home4/2762262d/perl5/lib/perl5/auto/share/dist/AGAT/feature_levels.yaml file
=> Attribute used to group features when no Parent/ID relationship exists (i.e common tag):
    * locus_tag
    * gene_id
=> merge_loci option deactivated
=> Machine information:
    This script is being run by perl v5.32.1
    Bioperl location being used: /export/home4/2762262d/perl5/lib/perl5/Bio/
    Operating system being used: linux 
=> Accessing Ontology
    No ontology accessible from the gff file header!
    We use the SOFA ontology distributed with AGAT:
        /export/home4/2762262d/perl5/lib/perl5/auto/share/dist/AGAT/so.obo
    Read ontology /export/home4/2762262d/perl5/lib/perl5/auto/share/dist/AGAT/so.obo:
        4 root terms, and 2596 total terms, and 1516 leaf terms
    Filtering ontology:
The feature type (3rd column) is constrained to be either a term from the Sequence Ontology or an SO accession number. The latter alternative is distinguished using the syntax SO:000000. In either case, it must be sequence_feature (SO:0000110) or an is_a child of it.
We filter the ontology to apply this rule.      We found 1861 terms that are sequence_feature or is_a child of it.
--------------------------------- parsing file ---------------------------------
=> Number of line in file: 17107786
=> Number of comment lines: 10544
=> Fasta included: No
=> Number of features lines: 17097242
=> Number of feature lines with 8 fields (while 9 expected): 1
=> Number of feature lines with 1 fields (while 9 expected): 2
=> Number of feature lines with 4 fields (while 9 expected): 1
=> Number of feature lines with 17 fields (while 9 expected): 3
=> Number of feature lines with 2 fields (while 9 expected): 1
=> Number of feature lines with 10 fields (while 9 expected): 1
=> Number of feature lines with 7 fields (while 9 expected): 1
=> Number of feature lines with 16 fields (while 9 expected): 1
=> Number of feature lines with 14 fields (while 9 expected): 1
=> Number of feature lines with 3 fields (while 9 expected): 1
=> Number of feature type (3rd column): 16
    * Level1: 5 => match contig expressed_sequence_match protein_match gene
    * level2: 3 => match_part tRNA mRNA
    * level3: 4 => five_prime_UTR CDS three_prime_UTR exon
    * unknown: 4 => 3859.1-augustus-gene-44.109-mRNA-1:9;Parent=maker-CM043859.1-augustus-gene-44.109-mRNA-1
 30262675 maker 30475340
=> Version of the Bioperl GFF parser selected by AGAT: 3
WARNING level2: No Parent attribute found @ for the feature: CM043859.1 est_gff:est2genome  match_part  29388575    29388934    1286    +   .   ID "CM043859.1:CM043859.1" 
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
CM043859.1  est_gff:est2genome  match_part  29388575    29388934    1286    +   .   ID "CM043859.1:CM043859.1" 
WARNING level3: No Parent attribute found @ for the feature: CM043859.1 maker   CDS 13404575    13404702    .   +   0   Gap "M74 D3 M110 D1 M12 D5 M2 D1 M87 D1 M69"  ; ID "maker-CM043859.1-augustus-gene-44.86-mRNA-1:cds"  "Pare3:3.12.0.98"  ; Target "TCONS_00113512.p1 1 354 +" 
WARNING gff3 reader: Hmmm, be aware that your feature doesn't contain any Parent and locus tag. No worries, we will handle it by considering it as strictly sequential. If you disagree, please provide an ID or a comon tag by locus. @ the feature is:
CM043859.1  maker   CDS 13404575    13404702    .   +   0   Gap "M74 D3 M110 D1 M12 D5 M2 D1 M87 D1 M69"  ; ID "maker-CM043859.1-augustus-gene-44.86-mRNA-1:cds"  "Pare3:3.12.0.98"  ; Target "TCONS_00113512.p1 1 354 +" 

In return of my command (_agat_sp_manage_functional_annotation.pl -f RENAMED_NoSeq.gff -b MM10db_Prot_noLabel.blastp \ --output P_AGAT \ --db ./BLAST_DB/mm10_uniprot_canonical.fasta \ -i RENAMEDProteins.fasta.tsv) I get empty output GFF file. How can I fix this?

Juke34 commented 11 months ago

First check your file, it is not expected to have line (feature lines not comment lines) that have not 9 fields. E.g. line where you have "3859.1-augustus-gene-44.109-mRNA-1:9". This ID is in the 3rd column while it should be in the 9th.

isnmn commented 11 months ago

The file was generated by MAKER and honestly I don't know how to edit or fix it. I can go manually but it is a whole genome file and huge. Do you have any suggestions?

Juke34 commented 11 months ago

I had several times the issue with Maker. You can fix manually using a text editor or re-create the file from the folder where you ran MAKER via gaas_maker_merge_outputs_from_datastore.pl from the GAAS tool https://github.com/NBISweden/GAAS