frattalab / PAPA

PAPA (Pipeline-Alternative Polyadenylation) - Snakemake pipeline for analysis of APA from short-read RNA-seq data
GNU General Public License v3.0
1 stars 0 forks source link

Support for Ensembl style GTF files #2

Closed SamBryce-Smith closed 2 years ago

SamBryce-Smith commented 3 years ago
  1. Usual chromosome naming conventions things etc. (but I suspect not so much issue as use same ref for Stringtie and filtering)
  2. strings/naming conventions for attributes e.g. 'gene_type' & their associated protein-coding/lncRNA flags etc. (I think it's gene_biotype...)

And I'm sure I can think of other things down the line...

I suspect I'll need to add some options to config and to CLI of filter_by_tx_chain.py... Reckon set defaults at the bottom e.g.

# gencode/ensembl
gtf_style: gencode

gencode:
    gene_type_col: "gene_type"
    pc_lnc_flags: ["protein_coding", "lncRNA"]

ensembl:
    gene_type_col: "gene_biotype"
# and same for pc_lnc_flags but not sure what they are... 
SamBryce-Smith commented 2 years ago

closed with b970bb05 - from manual checks the only difference is the gene_type column. Extra CL option to specify whether reference source is gencode/ensembl