gersteinlab / texp

TeXP is a pipeline to gauge the autonomous transcription level of L1 subfamilies using short read RNA-seq data
Apache License 2.0
5 stars 1 forks source link

More Details in User Guide #5

Closed DarioS closed 4 years ago

DarioS commented 4 years ago

For example, I find no mention about paired-end RNA sequencing which is the most common kind these days. Also, what about stranded and unstranded protocols? I have BAM files from TCGA and they happen to be an unstranded protocol, sadly. How does the performance compare for poly-A RNA versus total RNA which is ribosome-depleted? Why is Bowtie2 mandatory when it's not designed for RNA-seq data, unlike HISAT2 or STAR?

fabiocpn commented 4 years ago

Dear Dario,

I'll break this issue into a few parts:

1) no mention about paired-end RNA sequencing

This is now a recurrent issue brought up by multiple users. The current implementation of TeXP only supports single-end data. As I mentioned in previous issues and now at the README, if you have paired-end data you can run TeXP independently on both ends and calculate the average. Empirically, if the library quality is good, P1 and P2 have very similar estimates.

2) what about stranded and unstranded protocols?

TeXP is completely compatible with unstranded RNA-seq. In fact, I have not tested what is the effect of stranded RNA-seq on TeXP estimations, I'll follow up on this next time I work on updating TeXP

3) How does the performance compare for poly-A RNA versus total RNA which is ribosome-depleted?

I recommend using polyA+ libraries since those libraries should enrich for mRNA, if only ribo- libraries are available, I expect to see a higher amount of pervasive transcription on the estimations. TeXP should be able to handle both of these libraries.

4) Bowtie2 mandatory when it's not designed for RNA-seq data, unlike HISAT2 or STAR

I'm not really sure I agree with Bowtie2 not being designed for RNA-seq. Nonetheless, LINE-1 does not contain introns, thus the advantage of HISAT2 or STAR for LINE-1 transcription estimations is questionable.

Figure S15 on TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements also addresses this question. Aligners don't seem to have such a large effect on TeXP estimations. We compared STAR, bowtie2 and BWA. I'm not sure about HISAT2.

Therefore, as long as you allow multiple mapping and do not suppress aligners to report multi-mapping reads, TeXP should work seamlessly. S15 Figure.pdf

DarioS commented 4 years ago

This answers all my confusions well. It's a good point that LINE1 is unspliced.