Closed EinarBaldvin closed 1 year ago
Hi, Yes please provide a tiny sample file with keep_list.txt and kill_list.txt and telling me what you expect and what you think is wrong from the corresponding output.
Hey,
here is a complete sample set, the kill gff file is what I am expecting the keep gff to be. What is wrong is that the gene IDs that are not in the keep list are still retained in the output keep gff.
I guess the problem you see is the same as https://github.com/NBISweden/AGAT/issues/370
Actually it is expected by AGAT. Keep and Kill list does not work exactly the same way.
Let's use the case below that is a single record as seen by AGAT. The record is one gene with 2 isoforms.
If you use gene-1.1
with agat_sp_filter_feature_from_keep_list.pl
the whole record is kept (everything that is linked directly or indirectly to gene-1.1
will be kept). It is described in the help like that:
The default behaviour is to look at the features's ID. If the feature has an ID
(case insensitive) listed among the keeplist it will be kept along with all
related features
If you use gene-1.1
with agat_sp_filter_feature_from_kill_list.pl
only gene-1.1
mRNA and sub features (its CDS, exons, etc) will be removed, that will result to keep the gene-1
gene feature with the other mRNA isoform gene-1.2
.
It is explained in the help like that
/!\ Removing a level1 or level2 feature will automatically remove all linked subfeatures, and
removing all children of a feature will automatically remove this feature too.
example
chr1H transdecoder gene 14865 18619 . - . ID=gene-1
chr1H transdecoder mRNA 14865 18573 . - . ID=gene-1.1;Parent=gene-1
chr1H transdecoder exon 14865 15250 . - . ID=gene-1.1-exon1;Parent=gene-1.1
chr1H transdecoder three_prime_UTR 14865 15034 . - . ID=gene-1.1-three_prime_utr16;Parent=gene-1.1
chr1H transdecoder CDS 15035 15250 . - 0 ID=gene-1.1-cds7;Parent=gene-1.1
chr1H transdecoder stop_codon 15035 15037 . - 0 ID=gene-1.1-stop_codon15;Parent=gene-1.1
chr1H transdecoder mRNA 15142 18619 . - . ID=gene-1.2;Parent=gene-1
chr1H transdecoder exon 15142 15250 . - . ID=gene-1.2-exon1;Parent=gene-1.2
chr1H transdecoder three_prime_UTR 15142 15250 . - . ID=gene-1.2-three_prime_utr15;Parent=gene-1.2
chr1H transdecoder CDS 15257 15415 . - 0 ID=gene-1.1-cds8;Parent=gene-1.1
chr1H transdecoder exon 15257 15415 . - . ID=gene-1.1-exon2;Parent=gene-1.1
chr1H transdecoder exon 15257 15415 . - . ID=gene-1.2-exon2;Parent=gene-1.2
chr1H transdecoder three_prime_UTR 15257 15293 . - . ID=gene-1.2-three_prime_utr16;Parent=gene-1.2
Hi Jacques,
thank you for all the AGAT tools, I have been relying on them extensively to finish my annotations.
Describe the bug The tool does not output a filtered gff, but the log reports all having been performed correctly.
If I invert my ID-list and use the Kill feature tool I get the desired result, a filtered gff. One caveat is that I am removing isoforms (level 2) and not the complete level 1 feature.
General (please complete the following information):
To Reproduce agat_sp_filter_feature_from_keep_list.pl --gff anno_gff --keep_list final_keep_list.txt --output anno.repiso.gff3
Example of IDs from two tries, they work fine with the Kill tool. gene-10000.1 gene-10001.1
PB.10000.1.p2 PB.1000.11.p1
Let me know if you need any sample data.
Cheers, Einar