frattalab / PAPA

PAPA (Pipeline-Alternative Polyadenylation) - Snakemake pipeline for analysis of APA from short-read RNA-seq data
GNU General Public License v3.0
1 stars 0 forks source link

Generate 'final' match_stats.tsv file combining filtering scripts and tracking transcript IDs across scripts #26

Closed SamBryce-Smith closed 1 year ago

SamBryce-Smith commented 2 years ago

Downstream steps operate on reference merged GTF which has different transcript IDs to previous filtering steps. This means any 'match_stats' tables become uninformative for downstream analysis (e.g. is this novel tx a 3'UTR extension, 3'UTR intron etc.?). These transcript IDs are all tracked by GFFcompare when merging, so it should be possible to:

1- Merge novel transcripts with reference GTF 2- Load in match_stats tables from each filtering step & concatenate 3- Load in '.tracking' GFFcompare file to get df of old_tx_id | ref_merged_tx_id 4- Merge by old_tx_id to get new Tx_id in merged match_stats table

SamBryce-Smith commented 1 year ago

don't think this is relevant anymore