NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
467 stars 56 forks source link

Extracting Coordinates by Tag/Tags in GFF File #404

Closed MeHelmy closed 10 months ago

MeHelmy commented 1 year ago

Issue Description: I'm looking for a way to extract coordinates based on a specific tag from a GFF file. Specifically, I want to extract exons for downstream analysis. The desired output should resemble the following format:

1       havana  exon    182696  182746  .       +       .       gene_id "ENSG00000279928"; .....

What I've Tried: I attempted to use the agat_sp_extract_attributes.pl script with the following command:

agat_sp_extract_attributes.pl -gff Homo_sapiens.GRCh38.110.gtf -att ccds_id,exon_id,exon_number,exon_version,gene_biotype,gene_id,gene_name,gene_source,gene_version,tag,transcript_biotype,transcript_id,transcript_name,transcript_source,transcript_support_level,transcript_version -p exon -o exon.gtf

However, this command did not yield the expected output.

Expected Result: I'm looking for some guidance on correctly extracting exon coordinates based on the tag field and generating the desired output format as shown above.

Equivalent code in awk awk '$3=="exon"' Homo_sapiens.GRCh38.110.gtf | less

Thanks, Medhat

Juke34 commented 1 year ago

Hi,

I guess the script you are looking for is agat_sp_filter_feature_by_attribute_presence.plor agat_sp_filter_feature_by_attribute_value.pl

MeHelmy commented 1 year ago

Thank you, I will try them and update you. Best, Medhat