NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
465 stars 56 forks source link

Can't call method "end" on an undefined value at agat_convert_sp_gff2gtf.pl line 339. #345

Closed Rafaelsoler13 closed 1 year ago

Rafaelsoler13 commented 1 year ago

I am obtaining this bug in AGAT 1.0.0 (Linux) when I am tryining to run agat_convert_sp_gff2gtf.pl into a gff file generated by agat_convert_sp_gxf2gxf.pl

Can't call method "end" on an undefined value at /usr/local/bin/agat_convert_sp_gff2gtf.pl line 339.

Same as here https://github.com/NBISweden/AGAT/issues/245#issue-1220227757

This is the output:

agat_convert_sp_gff2gtf.pl --gff HLcro_TOGA_agat.gff -o Hlcro_TOGA_agat.gtf
Using standard ~/anaconda3/envs/agat/lib/perl5/site_perl/auto/share/dist/AGAT/config.yaml file
Update config
Reading input file
********************************************************************************
*                              - Start parsing -                               *
********************************************************************************
-------------------------- parse options and metadata --------------------------
=> Accessing the feature_levels YAML file
Using standard ~/anaconda3/envs/agat/lib/perl5/site_perl/auto/share/dist/AGAT/feature_levels.yaml file
=> Attribute used to group features when no Parent/ID relationship exists (i.e common tag):
    * locus_tag
    * gene_id
=> merge_loci option deactivated
=> Machine information:
    This script is being run by perl v5.32.1
    Bioperl location being used: ~/anaconda3/envs/agat/lib/perl5/site_perl/Bio/
    Operating system being used: linux 
=> Accessing Ontology
    No ontology accessible from the gff file header!
    We use the SOFA ontology distributed with AGAT:
        ~/anaconda3/envs/agat/lib/perl5/site_perl/auto/share/dist/AGAT/so.obo
    Read ontology ~/anaconda3/envs/agat/lib/perl5/site_perl/auto/share/dist/AGAT/so.obo:
        4 root terms, and 2596 total terms, and 1516 leaf terms
    Filtering ontology:
        We found 1861 terms that are sequence_feature or is_a child of it.
--------------------------------- parsing file ---------------------------------
=> Number of line in file: 1315738
=> Number of comment lines: 1
=> Fasta included: No
=> Number of features lines: 1315737
=> Number of feature type (3rd column): 6
    * Level1: 1 => gene
    * level2: 1 => transcript
    * level3: 4 => start_codon stop_codon exon CDS
    * unknown: 0 => 
=> Version of the Bioperl GFF parser selected by AGAT: 3
WARNING l2 and l1 features not on same seq_id @ LOC120317568__rna-XM_039364246.1.16338 level2 feature is on HiC_scaffold_11 sequence while LOC120317568 level1 feature is on ptg000195l_add
WARNING l2 and l1 features not on same seq_id @ LOC120317568__rna-XM_039364246.1.216111 level2 feature is on HiC_scaffold_1 sequence while LOC120317568 level1 feature is on ptg000195l_add
WARNING l2 and l1 features not on same seq_id @ LOC120317568__rna-XM_039364246.1.121150 level2 feature is on HiC_scaffold_2 sequence while LOC120317568 level1 feature is on ptg000195l_add
WARNING l2 and l1 features not on same seq_id @ LOC120317568__rna-XM_039364246.1.176791 level2 feature is on HiC_scaffold_9 sequence while LOC120317568 level1 feature is on ptg000195l_add
WARNING l2 and l1 features not on same seq_id @ LOC120317568__rna-XM_039364246.1.329461 level2 feature is on HiC_scaffold_1 sequence while LOC120317568 level1 feature is on ptg000195l_add
WARNING l2 and l1 features not on same seq_id @ LOC120317568__rna-XM_039364246.1.208590 level2 feature is on HiC_scaffold_1 sequence while LOC120317568 level1 feature is on ptg000195l_add
WARNING l2 and l1 features not on same seq_id @ LOC120317568__rna-XM_039364246.1.106549 level2 feature is on HiC_scaffold_3 sequence while LOC120317568 level1 feature is on ptg000195l_add
WARNING l2 and l1 features not on same seq_id @ LOC120317550__rna-XM_039364225.1.8882 level2 feature is on HiC_scaffold_11 sequence while LOC120317550 level1 feature is on ptg000195l_add
WARNING l2 and l1 features not on same seq_id @ LOC120317555__rna-XM_039364233.1.21615 level2 feature is on HiC_scaffold_11 sequence while LOC120317555 level1 feature is on ptg000195l_add
WARNING l2 and l1 features not on same seq_id @ LOC120317557__rna-XM_039364235.1.8882 level2 feature is on HiC_scaffold_11 sequence while LOC120317557 level1 feature is on ptg000195l_add
WARNING l2 and l1 features not on same seq_id  ************** Too much WARNING message we skip the next **************
1966 warning messages: WARNING l2 and l1 features not on same seq_id 
********************************************************************************
*                               - End parsing -                                *
*                             done in 137 seconds                              *
********************************************************************************

********************************************************************************
*                               - Start checks -                               *
********************************************************************************
---------------------------- Check1: feature types -----------------------------
----------------------------------- ontology -----------------------------------
All feature types in agreement with the Ontology.
------------------------------------- agat -------------------------------------
AGAT can deal with all the encountered feature types (3rd column)
------------------------------ done in 0 seconds -------------------------------

------------------------------ Check2: duplicates ------------------------------
None found
------------------------------ done in 0 seconds -------------------------------

-------------------------- Check3: sequential bucket ---------------------------
None found
------------------------------ done in 1 seconds -------------------------------

--------------------------- Check4: l2 linked to l3 ----------------------------
No problem found
------------------------------ done in 0 seconds -------------------------------

--------------------------- Check5: l1 linked to l2 ----------------------------
No problem found
------------------------------ done in 0 seconds -------------------------------

--------------------------- Check6: remove orphan l1 ---------------------------
We remove only those not supposed to be orphan
None found
------------------------------ done in 0 seconds -------------------------------

------------------------- Check7: all level3 locations -------------------------
------------------------------ done in 10 seconds ------------------------------

------------------------------ Check8: check cds -------------------------------
No problem found
------------------------------ done in 2 seconds -------------------------------

----------------------------- Check9: check exons ------------------------------
No exons created
No exons locations modified
No supernumerary exons removed
No level2 locations modified
------------------------------ done in 17 seconds ------------------------------

----------------------------- Check10: check utrs ------------------------------
No UTRs created
No UTRs locations modified
No supernumerary UTRs removed
------------------------------ done in 8 seconds -------------------------------

------------------------ Check11: all level2 locations -------------------------
No problem found
------------------------------ done in 10 seconds ------------------------------

------------------------ Check12: all level1 locations -------------------------
No problem found
------------------------------ done in 0 seconds -------------------------------

---------------------- Check13: remove identical isoforms ----------------------
None found
------------------------------ done in 55 seconds ------------------------------
********************************************************************************
*                                - End checks -                                *
*                             done in 103 seconds                              *
********************************************************************************

=> OmniscientI total time: 240 seconds
converting to GTF3
Formating output to GTF3
Can't call method "end" on an undefined value at /home/victor/anaconda3/envs/agat/lib/perl5/site_perl/AGAT/OmniscientToGTF.pm line 313.
Juke34 commented 1 year ago

You have plenty of WARNING l2 and l1 features not on same seq_id @ LOC120317568__rna-XM_039364246.1.208590 level2 feature is on HiC_scaffold_1 sequence while LOC120317568 level1 feature is on ptg000195l_add

AGAT does not work well on records spread over different sequences. If a gene has two isoforms mRNA1 on sequenceA, and mRNA2 on sequenceB, AGAT takes the higher value as stop position for the gene... but the gene can be on sequenceA and the higher value comes form the mRNA2 from sequenceB. So if sequenceA and B are not the same length then AGAT can look for positions on sequenceA that do not exist because the sequenceA is shorter.

Basically this case is not really supposed to happen... Check how the file has been created.

If you are really sure you want to process this file, I would suggest you create one file by seq_id using an awk command: awk '{if($1 !~ /^#/) { file=$1".gff"; print $0 >> file }}' input.gff Then process each file independently with AGAT. Modify ID that all files have no similar IDs (with AGAT) then concatenate your files in one.