NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
432 stars 52 forks source link

Creating and deleting the same entries? #385

Closed SheepwormJM closed 11 months ago

SheepwormJM commented 11 months ago

Hi,

Thanks for creating this software.

Not sure if this is a bug, or something that is ok.

I have used the v1.2.0 of AGAT:

agat_convert_sp_gff2gtf.pl --gff haemonchus_contortus.PRJEB506.WBPS18.annotations.gff3 --o haemonchus_contortus.PRJEB506.WBPS18.annotations.gtf

You can find the gff3 file here: https://parasite.wormbase.org/Haemonchus_contortus_prjeb506/Info/Index/

It has created 133 features to link L3 to L1 via a new L2 feature:

L3 was directly linked to L1. Corrected by creating the intermediate L2 feature from L1 feature:
hcontortus_chr1_Celeg_TT_arrow_pilon    WormBase_imported       RNA     5834    6930    .       +       .       ID "transcript:HCON_00000010-00001"  ; Name "HCON_00000010-00001"  ; Parent "nbis-pseudogene-1"

It has then deleted 133 features that it considered orphan L1s:

We remove only those not supposed to be orphan
removing hcontortus_chr5_Celeg_TT_arrow_pilon   WormBase_imported       gene    22049440        22076704        .       +       .       ID "gene:HCON_00145520"  ; Name HCON_00145520 ; biotype pseudogene

Which I found slightly coincidental, so ran a check for the first gene corrected in check 4, and it came up with:

removing hcontortus_chr1_Celeg_TT_arrow_pilon   WormBase_imported       gene    5834    6930    .       +       .       ID "gene:HCON_00000010"  ; Name HCON_00000010 ; biotype pseudogene

And for another that I also checked for.

So, I'm assuming that it has both added an L2 feature then, deleted the L1 feature for all 133.

Is this supposed to be happening? Is it maybe that pseudogene isn't recognised as a feature in the config file?

This is the gene in the gff3 file:

hcontortus_chr1_Celeg_TT_arrow_pilon    WormBase_imported       gene    5834    6930    .       +       .       ID=gene:HCON_00000010;Name=HCON_00000010;biotype=pseudogene
hcontortus_chr1_Celeg_TT_arrow_pilon    WormBase_imported       pseudogene      5834    6930    .       +       .       ID=transcript:HCON_00000010-00001;Parent=gene:HCON_00000010;Name=HCON_00000010-00001
hcontortus_chr1_Celeg_TT_arrow_pilon    WormBase_imported       exon    5834    6930    .       +       .       ID=exon:HCON_00000010-00001.1;Parent=transcript:HCON_00000010-00001

Thanks in advance! Jenni

Juke34 commented 11 months ago

Hi,

Right pseudogene is set as level1 feature in AGAT as e.g. gene, while in your file it is set as level2 as e.g. mRNA ( it has a gene as parent). So you need to change the feature_levels.yaml to remove pseudogene from level1 and add it as a level2 feature. (See agat - -help)

SheepwormJM commented 11 months ago

Thanks Juke! :)