DiogoVeiga / maser

Mapping Alternative Splicing Events to pRoteins
Other
18 stars 14 forks source link

Preparing GTF file for running rMATS #3

Closed DiogoVeiga closed 6 years ago

DiogoVeiga commented 6 years ago

R code for the following: Download appropriate GTF from AnnotationHub (GRanges) Filter exons and transcripts - rMATS use only these features of the GTF to detect alternative splicing. Filter protein coding transcripts. Export GTF to a file using rtracklayer.

DiogoVeiga commented 6 years ago

By design - the GTF accepted by maser should be a GRanges object, containing all the exons in the transcriptome. The GRanges has to contain the following metadata:

"seqnames", "start", "end", "strand", "exon_id", "transcript_name"

Exons will be grouped into transcripts using "transcript_name" AnnotationHub objects automatically have this information.

DiogoVeiga commented 6 years ago

The package will accept rMATS results based on any GTF, i.e. the analysis can start based on any GTF. However, mapping of events to transcripts will use GTFs from Ensembl or Gencode hg38 assembly. These can be downloaded (readGFF) or retrieved in AnnotationHub.

Therefore we recommend using Ensembl or Gencode hg38 assembly to keep consistency.