Xinglab / rmats-turbo

Other
219 stars 53 forks source link

Using custom GTF #186

Open ebonner303 opened 2 years ago

ebonner303 commented 2 years ago

So I am trying to analyze some data for aberrant splicing following a previously published protocol. In it they describe making a custom GTF file based on both the downloadable UCSC gtf and their seq data to use with rMATS. I have used STAR and GRCh38 to align my reads as well as generated gtf files with StringTie from the BAM files, but I am unsure how to concatenate the ensemble gtf and my StringTie gtfs to use with rMATS. I am unsure what rMATS needs to perform its alignment in terms of gtf input. any advice in assembling this file for use with RMATS would be greatly appreciated.

EricKutschera commented 2 years ago

If rMATS is run with fastq files (--s1, --s2) then it will pass --gtf to STAR. If you already ran STAR on your reads and are running rMATS with BAM files (--b1, --b2) then you just need --gtf to use consistent coordinates with your BAM files. It sounds like your --gtf should be fine in terms of coordinates because it's a combination of the GTF used with STAR to create the BAMs and additional GTFs generated based on the same BAMs

Here's the rMATS code for parsing --gtf: https://github.com/Xinglab/rmats-turbo/blob/v4.1.2/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L96

In terms of formatting, rMATS is looking at lines from --gtf with "exon" in the "feature" column and also looking for these tags in the attribute column: gene_id, transcript_id, gene_name. Here's an example line that rMATS can use: chr1 ensGene exon 11869 12227 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "ENSG00000223972";

In terms of combining your GTF files, rMATS does not require --gtf to be sorted in any way so you can combine the lines from the GTF files in any order. rMATS does its analysis one gene at a time. Make sure that the "gene_id" attributes are consistent among the GTF files so that rMATS can use transcripts from all original GTFs when detecting splicing events for a gene