Open Motoufiq opened 2 weeks ago
If you have 151bp reads then ideally you would create the STAR reference with --sjdbOverhang 150 and then run rMATS with --readLength 151. From the STAR manual, the exact value of --sjdbOverhang doesn't seem to matter too much: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
--sjdbOverhang specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database. Ideally, this length should be equal to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina 2x100b paired-end reads, the ideal value is 100-1=99. In case of reads of varying length, the ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as well as the ideal value
For rMATS, --readLength needs to be the actual read length or the reads will get filtered out. However if you use --variable-read-length then the exact value of --readLength doesn't matter too much. This post has details about --readLength in rMATS: https://github.com/Xinglab/rmats-turbo/issues/83#issuecomment-766845772
@EricKutschera thank you for the response.
Hi,
I’m currently working on the
RNA-Seq
splicing analysis and have a question about parameter used to build referenceSTAR
(https://github.com/alexdobin/STAR/tree/master) usingGENCODE
genome (https://www.gencodegenes.org/human/) which then would be related to input parameter for rMATS analysis. My paired end fastq files are 151 bp read length. As this is short read RNA-Seq splicing analysis with 151 bp, I was wondering if I should input--sjdboverhang 150
parameter was to build this STAR reference. NOTE:--sjdboverhang
in the STAR specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database (in my case the PE read length is 151 bp), then my --sjdboverhang 150 (ReadLength-1: 151-1 = 150)I want to set the below paramater in
rMATS
:--readLength READLENGTH
The length of each read. Required parameter, with the value set according to the RNA-seq read lengthIt seems It’s quite common for Illumina sequencers to produce paired-end reads with a read length of 151 bp instead of the expected 150 bp. Is it OK to build STAR index
--sjdboverhang 150
, and does rMATS take input of read length of 151 base pair? or to build STAR index--sjdboverhang 149
instead and rMATS take input of read length of 150 bp ?Additionally, if I don't input --sjdboverhang paramter, while building STAR INDEX, can I run RNA-Seq splicing analysis, and splicing analysis makes sense?
Thank you, Toufiq