OliveiraDS-hub / ChimeraTE

A pipeline to detect chimeric transcripts derived from genes and transposable elements.
GNU General Public License v3.0
21 stars 6 forks source link

Analysis with unstranded data #9

Closed Ifengel closed 1 year ago

Ifengel commented 1 year ago

Hello!

I am interested in using chimeraTE to analyze massive unstranded sequencing data, but it seems that this software is not adapted for this kind of data.

Is there a simple way I can adapt it to work with paired-end and unstranded data. If not, what alternative do I have?

Thanks in advance!

OliveiraDS-hub commented 1 year ago

Hello @Ifengel

Unfortunately, it is not a matter of adaptation. There is a biological reason why ChimeraTE was designed to work specially with stranded RNA-seq.

strand

You can have many cases of TEs embedded/overlapping exons (CDS and UTRs) for which TE insertions and genes are in the opposite strands. By using stranded RNA-seq, you can recognize that the TE is generating a chimera from the gene on the minus strand, and the result will give you a chimera from the TE and the gene on the minus. However, if you use unstranded RNA-seq reads, you cannot distinguish which exon is being transcribed alongside with the TE insertion. In this case, you will either detect both genes generating chimeras (most common), or loss power because you decreased your coverage between the strands.

That's why ChimeraTE only detects chimeric reads when gene strand is the same as aligned reads strand (gene +/reads aligned on +; OR gene -/reads aligned -). If your reads are unstranded, they will be mixed, resulting on less precision.

If it's crucial to you to use unstranded data, you could simply run one time with rf-stranded, and another time with fwd-stranded. Then after you merge the results. But be aware that you should treat in a special manner your results for TE-exonized embedded and overlapped. Looking on IGV can help you a lot to filter out cases in which you don't know from where the reads are coming from.

I'm closing the issue, feel free to reopen it.