Closed DarioS closed 4 years ago
Dear Dario,
I'll break this issue into a few parts:
1) no mention about paired-end RNA sequencing
This is now a recurrent issue brought up by multiple users. The current implementation of TeXP only supports single-end data. As I mentioned in previous issues and now at the README, if you have paired-end data you can run TeXP independently on both ends and calculate the average. Empirically, if the library quality is good, P1 and P2 have very similar estimates.
2) what about stranded and unstranded protocols?
TeXP is completely compatible with unstranded RNA-seq. In fact, I have not tested what is the effect of stranded RNA-seq on TeXP estimations, I'll follow up on this next time I work on updating TeXP
3) How does the performance compare for poly-A RNA versus total RNA which is ribosome-depleted?
I recommend using polyA+ libraries since those libraries should enrich for mRNA, if only ribo- libraries are available, I expect to see a higher amount of pervasive transcription on the estimations. TeXP should be able to handle both of these libraries.
4) Bowtie2 mandatory when it's not designed for RNA-seq data, unlike HISAT2 or STAR
I'm not really sure I agree with Bowtie2 not being designed for RNA-seq. Nonetheless, LINE-1 does not contain introns, thus the advantage of HISAT2 or STAR for LINE-1 transcription estimations is questionable.
Figure S15 on TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements also addresses this question. Aligners don't seem to have such a large effect on TeXP estimations. We compared STAR, bowtie2 and BWA. I'm not sure about HISAT2.
Therefore, as long as you allow multiple mapping and do not suppress aligners to report multi-mapping reads, TeXP should work seamlessly. S15 Figure.pdf
This answers all my confusions well. It's a good point that LINE1 is unspliced.
For example, I find no mention about paired-end RNA sequencing which is the most common kind these days. Also, what about stranded and unstranded protocols? I have BAM files from TCGA and they happen to be an unstranded protocol, sadly. How does the performance compare for poly-A RNA versus total RNA which is ribosome-depleted? Why is Bowtie2 mandatory when it's not designed for RNA-seq data, unlike HISAT2 or STAR?