NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
465 stars 56 forks source link

Specify which attributes to keep. #311

Closed mossconfuse closed 1 year ago

mossconfuse commented 1 year ago

Hi

I get recurring errors from cellranger-arc regarding my agat generated gtf files. I think the majority of this stems from the formatting of the attributes column. 10X gives recommendations, though not a detailed solution to fix this

"In all of the above cases, the reasons range from either duplicate/missing features or poorly formatted entries. To troubleshoot such issues, the following steps can be implemented using custom scripts:

Recommended to retain only gene_id, transcript_ids, and gene_name attributes. Verify for any redundancy and order genes in the annotation file Replace or remove the gene_ids that have empty values. Duplicate transcript_ids for multiple gene_id must be converted as unique (eg: unknown_transcript_1 fields)"

Could you please add a feature that would allow users to select which attributes are kept? It is easy enough to keep or discard rows that get ignored by cellranger using grep, but fixing the attributes column get harder when different gtf files get combined and the order of attributes are not consistent.

A feature like agat_keep_attr.pl "gene_id" "transcript_id" "gene_name" would be great.

Thanks

Juke34 commented 1 year ago

You can try agat_sq_list_attributes.pl, to list the attributes then agat_sp_manage_attributes.pl to remove the attributes you do not want. The poblem with the second script (sp prefix) must keep the ID parent attributes. You will be forbidden to remove it. Keep a try and see if it works. Otherwise I will implement something like agat_sq_filter_attributes.pl

mossconfuse commented 1 year ago

Thank you, Juke34, I could remove attributes with your advice.