NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
459 stars 56 forks source link

agat removes certain isoforms. Should I be worried if working on differential transcript usage? #79

Closed scseekers closed 4 years ago

scseekers commented 4 years ago

Hi AGAT creators!

I wish to understand on what basis does AGAT tool gff2gtf decides to remove the isoforms from the gff3 file. I recently processed Oryza sativa gff3 file and saw some isoforms removed from the file. Should I be concerned regarding the 28 isoforms it removes, if I have to perform the Differential Transcript usage analysis?

agat_convert_sp_gff2gtf.pl --gff all.gff3 --relax -o all_relax.gtf
converting to GTF3
********************************************************************************
*                              - Start parsing -                               *
********************************************************************************
-------------------------- parse options and metadata --------------------------
=> Accessing the feature level json files
        Using standard /home/surabhi.rathore/miniconda3/envs/agat/lib/site_perl/5.26.2/auto/share/dist/AGAT/features_level1.json file
        Using standard /home/surabhi.rathore/miniconda3/envs/agat/lib/site_perl/5.26.2/auto/share/dist/AGAT/features_level2.json file
        Using standard /home/surabhi.rathore/miniconda3/envs/agat/lib/site_perl/5.26.2/auto/share/dist/AGAT/features_level3.json file
        Using standard /home/surabhi.rathore/miniconda3/envs/agat/lib/site_perl/5.26.2/auto/share/dist/AGAT/features_spread.json file
=> Attribute used to group features when no Parent/ID relationship exists:
        * locus_tag
        * gene_id
=> merge_loci option deactivated
=> Accessing Ontology
        No ontology accessible from the gff file header!
        We use the SOFA ontology distributed with AGAT:
                /home/surabhi.rathore/miniconda3/envs/agat/lib/site_perl/5.26.2/auto/share/dist/AGAT/so.obo
        Read ontology /home/surabhi.rathore/miniconda3/envs/agat/lib/site_perl/5.26.2/auto/share/dist/AGAT/so.obo:
                4 root terms, and 2472 total terms, and 1436 leaf terms
        Filtering ontology:
                We found 1757 terms that are sequence_feature or is_a child of it.
-------------------------------- parse features --------------------------------
=> GFF version parser used: 3
********************************************************************************
*                               - End parsing -                                *
*                             done in 230 seconds                              *
********************************************************************************

********************************************************************************
*                               - Start checks -                               *
********************************************************************************
---------------------------- Check1: feature types -----------------------------
----------------------------------- ontology -----------------------------------
All feature types in agreement with the Ontology.
------------------------------------- agat -------------------------------------
AGAT can deal with all the encountered feature types (3rd column)
------------------------------ done in 0 seconds -------------------------------

------------------------------ Check2: duplicates ------------------------------
None found
------------------------------ done in 0 seconds -------------------------------

-------------------------- Check3: sequential bucket ---------------------------
Nothing to check as sequential bucket!
------------------------------ done in 0 seconds -------------------------------

--------------------------- Check4: l2 linked to l3 ----------------------------
No problem found
------------------------------ done in 1 seconds -------------------------------

--------------------------- Check5: l1 linked to l2 ----------------------------
No problem found
------------------------------ done in 0 seconds -------------------------------

--------------------------- Check6: remove orphan l1 ---------------------------
We remove only those not supposed to be orphan
None found
------------------------------ done in 0 seconds -------------------------------

----------------------------- Check7: check exons ------------------------------
2 exons created that were missing
61 exons locations modified that were wrong
No supernumerary exons removed
No level2 locations modified
------------------------------ done in 20 seconds ------------------------------

------------------------------ Check8: check utrs ------------------------------
3 UTRs created that were missing
2 UTRs locations modified that were wrong
No supernumerary UTRs removed
------------------------------ done in 11 seconds ------------------------------

------------------------- Check9: all level2 locations -------------------------
No problem found
------------------------------ done in 12 seconds ------------------------------

------------------------ Check10: all level1 locations -------------------------
We fixed 36 wrong level1 location cases
------------------------------ done in 2 seconds -------------------------------

---------------------- Check11: remove identical isoforms ----------------------
Lets remove isoform LOC_Os06g45184.4
Lets remove isoform LOC_Os07g07030.3
Lets remove isoform LOC_Os08g44450.2
Lets remove isoform LOC_Os11g20330.2
Lets remove isoform LOC_Os06g46770.5
Lets remove isoform LOC_Os03g55150.6
Lets remove isoform LOC_Os04g58160.3
Lets remove isoform LOC_Os11g02660.2
Lets remove isoform LOC_Os04g14204.3
Lets remove isoform LOC_Os04g14204.2
Lets remove isoform LOC_Os09g20830.4
Lets remove isoform LOC_Os03g22890.2
Lets remove isoform LOC_Os06g45510.2
Lets remove isoform LOC_Os06g41390.2
Lets remove isoform LOC_Os12g41430.2
Lets remove isoform LOC_Os01g27020.4
Lets remove isoform LOC_Os06g41384.2
Lets remove isoform LOC_Os08g44380.2
Lets remove isoform LOC_Os01g08814.3
Lets remove isoform LOC_Os11g02480.2
Lets remove isoform LOC_Os11g16924.2
Lets remove isoform LOC_Os10g19919.4
Lets remove isoform LOC_Os06g07580.2
Lets remove isoform LOC_Os10g42730.3
Lets remove isoform LOC_Os02g33450.2
Lets remove isoform LOC_Os01g34620.9
Lets remove isoform LOC_Os08g16914.2
Lets remove isoform LOC_Os03g08300.2
28 identical isoforms removed
------------------------------ done in 2 seconds -------------------------------
********************************************************************************
*                                - End checks -                                *
*                              done in 48 seconds                              *
********************************************************************************

=> OmniscientI total time: 278 seconds
Bye Bye
Juke34 commented 4 years ago

It should not be a problem for your analysis. The question is why the annotation contains genes that have isoforms 100% identical? This is probably not normal. You might contact the provider of this file and point the different problems encountered. Apparently there is plenty of issue in that file...

scseekers commented 4 years ago

@Juke34 also for some reason the gtf file so produced is failing while running the STAR for all genes. The gff3 file was obtained from http://rice.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/all.dir/

Oct 10 01:11:23 ..... processing annotations GTF
WARNING: while processing sjdbGTFfile=/scratch/Asad/E.Coli-Resistant/multiqc_data/tmp/rice_paper_springer/raw/trimmomatic/index/temp.gtf: chromosome 'Chr1' not found in Genome fasta files for line:
Chr1    MSU_osa1r7      exon    2903    3268    .       +       .       gene_id "LOC_Os01g01010"; transcript_id "LOC_Os01g01010.1"; ID "LOC_Os01g01010.1:exon_1"; Parent "LOC_Os01g01010.1";
WARNING: while processing sjdbGTFfile=/scratch/Asad/E.Coli-Resistant/multiqc_data/tmp/rice_paper_springer/raw/trimmomatic/index/temp.gtf: chromosome 'Chr1' not found in Genome fasta files for line:
scseekers commented 4 years ago

Oh! my bad. I was loading indexes from an older directory with problematic gtfs. Thanks, @Juke34 for helping me out.