Closed jahemker closed 4 weeks ago
Sorry your message stayed under my radar.
Your usage is good it is just the ID defined that are wrong. Indeed if you look in your file you will see e.g. for FBtr0070008 that the ID is defined as transcript:FBtr0070008
:
###
X FlyBase gene 20170222 20171526 . + . ID=gene:FBgn0031094;Name=CG9578;biotype=protein_coding;gene_id=FBgn0031094;logic_name=flybase
X FlyBase mRNA 20170222 20171526 . + . ID=transcript:FBtr0070008;Parent=gene:FBgn0031094;Name=CG9578-RA;biotype=protein_coding;tag=Ensembl_canonical;transcript_id=FBtr0070008
X FlyBase five_prime_UTR 20170222 20170348 . + . Parent=transcript:FBtr0070008
X FlyBase exon 20170222 20170363 . + . Parent=transcript:FBtr0070008;Name=FBtr0070008-E1;constitutive=1;ensembl_end_phase=0;ensembl_phase=-1;exon_id=FBtr0070008-E1;rank=1
X FlyBase CDS 20170349 20170363 . + 0 ID=CDS:FBpp0070007;Parent=transcript:FBtr0070008;protein_id=FBpp0070007
X FlyBase exon 20170424 20170758 . + . Parent=transcript:FBtr0070008;Name=FBtr0070008-E2;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=FBtr0070008-E2;rank=2
X FlyBase CDS 20170424 20170758 . + 0 ID=CDS:FBpp0070007;Parent=transcript:FBtr0070008;protein_id=FBpp0070007
X FlyBase exon 20170846 20171065 . + . Parent=transcript:FBtr0070008;Name=FBtr0070008-E3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=FBtr0070008-E3;rank=3
X FlyBase CDS 20170846 20171065 . + 1 ID=CDS:FBpp0070007;Parent=transcript:FBtr0070008;protein_id=FBpp0070007
X FlyBase CDS 20171130 20171378 . + 0 ID=CDS:FBpp0070007;Parent=transcript:FBtr0070008;protein_id=FBpp0070007
X FlyBase exon 20171130 20171526 . + . sfdParent=transcript:FBtr0070008;Name=FBtr0070008-E4;constitutive=1;ensembl_end_phase=-1;ensembl_phase=0;exon_id=FBtr0070008-E4;rank=4
X FlyBase three_prime_UTR 20171379 20171526 .df + . Parent=transcript:FBtr0070008
###
So you should replaceFBtr0070008
by transcript:FBtr0070008
in your protein_coding_canonical_transcripts.txt file.
Hello,
I have a D.melanogaster gff3 from Ensembl and I've extracted the transcript ids of the canonical transcripts for each protein coding gene. I would like to filter (keep) all entries in the gff3 that pertain to these canonical transcripts.
Based on my understanding it seems that agat_sp_filter_feature_from_keep_list.pl would work for this, however I have not been able to successfully filter the gff3. The output is always 0 records kept. I assume I am misunderstanding how this command works. Does AGAT have this functionality? Thank you!
gff3 from here: https://ftp.ensembl.org/pub/release-110/gff3/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.46.110.chr.gff3.gz
head of protein_coding_canonical_transcripts.txt
Output from report.txt
stdout when running the command