foerstner-lab / READemption

A pipeline for the computational evaluation of RNA-Seq data
https://reademption.readthedocs.io
Other
36 stars 19 forks source link

coverage -d option for unstranded RNA-Seq library #43

Closed EthanKhew closed 1 year ago

EthanKhew commented 1 year ago

I am using READemption to create wiggle files from unstranded paired-end RNA-Seq data for later sORF downstream analysis using ANNOgesic.

My RNA-Seq Data consists of 2 replicates for each condition, and each replicate has a forward and reverse read indicated by _1.fq and _2.fq respectively. The library strandness were checked using STAR software and that's how I came about the conclusion that my RNA-Seq library is an unstranded type.

To perform ANNOgesic, I require wiggle files which are created by using the "align" and "coverage" command in READemption. For "align", I ran everything by default. However, I am conflicted with what setting that I should use for "coverage" which is the [--non_strand_specific, -d = Do not distict between the coverage of the forward and reverse strand but sum them to a single value for each base] argument. Since the library is unstranded but the replicates have forward and reverse reads, I am not sure with what settings that I should opt for in this situation.

I hope you could take some time to explain further regarding the [-d] argument and which setting should I opt for in my analysis. Thanks in advance.

Tillsa commented 1 year ago

Some brief explanations about paired end sequencing: Every RNA is sequenced once from its 5' prime end and once from its 3' end. This will result in the two mates. Read 1 comes from the 5' prime end and read 2 from the 3' end of the RNA. Every read one is saved in your _1.fq file and every read two is saved in your _2.fq. That means you know the direction of the reads, but it doesn't tell you which DNA strand (forward or reverse) the RNA comes from. To know where an RNA comes from you need to align them to the reference genome. So after the alignment the alignment files (SAM/BAM) have the information whether your RNA template aligns to the forward or reverse strand. That means you can (and most of the times should) create strand specific coverage files. If you don't use the option [--non_strand_specific] you will get coverage files for both strands (forward and reverse), which I guess would make sense when using ANNOgesic.

EthanKhew commented 1 year ago

Dear @Tillsa, thank you very much for the swift response.