Closed hchetia closed 3 years ago
Figure 3F in our Bioinformatics paper, https://academic.oup.com/bioinformatics/article/36/4/1167/5581349, gives some sense of how L1EM performs without strand specific data. It still catches the most expressed locus correctly, but there are a few false positives in the 1 to 4 read pairs per million range. If you're looking at something like a cancer where LINE-1 is known to be expressed, I think you'll probably get away without strand specific data. However, you'll definitely want strand specific data if you're looking for LINE-1 in a context where it's expression is low.
Hi @wmckerrow Would you have any advice about ways to demarcate the false positives from the true positives? I have a huge set of unstranded-RNAseq dataset from control and diseased neurons and I would like to leverage L1EM for line1 mining from my data. Thanks.
In a large dataset you might look at how LINE-1 expression, both at specific loci and overall, correlates with the fraction of reads that align to exons. If you see a strong anti-correlation between exon mapping and LINE-1 that probably means that you're looking at intron retention and/or DNA contamination rather than LINE-1 expression.
Thanks @wmckerrow.
Can L1EM be used to reliably predict line elements from non-strand specific rnaseq datasets?