NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
465 stars 56 forks source link

agat_sp_filter_feature_from_keep_list.pl outputs an unfiltered gff #341

Closed EinarBaldvin closed 1 year ago

EinarBaldvin commented 1 year ago

Hi Jacques,

thank you for all the AGAT tools, I have been relying on them extensively to finish my annotations.

Describe the bug The tool does not output a filtered gff, but the log reports all having been performed correctly.

If I invert my ID-list and use the Kill feature tool I get the desired result, a filtered gff. One caveat is that I am removing isoforms (level 2) and not the complete level 1 feature.

General (please complete the following information):

To Reproduce agat_sp_filter_feature_from_keep_list.pl --gff anno_gff --keep_list final_keep_list.txt --output anno.repiso.gff3

Example of IDs from two tries, they work fine with the Kill tool. gene-10000.1 gene-10001.1

PB.10000.1.p2 PB.1000.11.p1

Let me know if you need any sample data.

Cheers, Einar

Juke34 commented 1 year ago

Hi, Yes please provide a tiny sample file with keep_list.txt and kill_list.txt and telling me what you expect and what you think is wrong from the corresponding output.

EinarBaldvin commented 1 year ago

Hey,

here is a complete sample set, the kill gff file is what I am expecting the keep gff to be. What is wrong is that the gene IDs that are not in the keep list are still retained in the output keep gff.

agat_keep_list_bug.tar.gz

Juke34 commented 1 year ago

I guess the problem you see is the same as https://github.com/NBISweden/AGAT/issues/370 Actually it is expected by AGAT. Keep and Kill list does not work exactly the same way.
Let's use the case below that is a single record as seen by AGAT. The record is one gene with 2 isoforms. If you use gene-1.1 with agat_sp_filter_feature_from_keep_list.pl the whole record is kept (everything that is linked directly or indirectly to gene-1.1 will be kept). It is described in the help like that:

The default behaviour is to look at the features's ID. If the feature has an ID
(case insensitive) listed among the keeplist it will be kept along with all
related features

If you use gene-1.1 with agat_sp_filter_feature_from_kill_list.pl only gene-1.1 mRNA and sub features (its CDS, exons, etc) will be removed, that will result to keep the gene-1 gene feature with the other mRNA isoform gene-1.2. It is explained in the help like that

/!\ Removing a level1 or level2 feature will automatically remove all linked subfeatures, and
removing all children of a feature will automatically remove this feature too.

example

chr1H   transdecoder    gene    14865   18619   .   -   .   ID=gene-1
chr1H   transdecoder    mRNA    14865   18573   .   -   .   ID=gene-1.1;Parent=gene-1
chr1H   transdecoder    exon    14865   15250   .   -   .   ID=gene-1.1-exon1;Parent=gene-1.1
chr1H   transdecoder    three_prime_UTR 14865   15034   .   -   .   ID=gene-1.1-three_prime_utr16;Parent=gene-1.1
chr1H   transdecoder    CDS 15035   15250   .   -   0   ID=gene-1.1-cds7;Parent=gene-1.1
chr1H   transdecoder    stop_codon  15035   15037   .   -   0   ID=gene-1.1-stop_codon15;Parent=gene-1.1
chr1H   transdecoder    mRNA    15142   18619   .   -   .   ID=gene-1.2;Parent=gene-1
chr1H   transdecoder    exon    15142   15250   .   -   .   ID=gene-1.2-exon1;Parent=gene-1.2
chr1H   transdecoder    three_prime_UTR 15142   15250   .   -   .   ID=gene-1.2-three_prime_utr15;Parent=gene-1.2
chr1H   transdecoder    CDS 15257   15415   .   -   0   ID=gene-1.1-cds8;Parent=gene-1.1
chr1H   transdecoder    exon    15257   15415   .   -   .   ID=gene-1.1-exon2;Parent=gene-1.1
chr1H   transdecoder    exon    15257   15415   .   -   .   ID=gene-1.2-exon2;Parent=gene-1.2
chr1H   transdecoder    three_prime_UTR 15257   15293   .   -   .   ID=gene-1.2-three_prime_utr16;Parent=gene-1.2