Closed MrKevinDC closed 4 months ago
Thats odd. In paired mode, this is actaully the number of read1s that are output. Is it possible you have some unpaired reads (and have set unpaired to use), or reads where the pairs can't be found? (the log would let you know this if it were the case).
In terms of UMIs per position, it is entirely experiment dependent. There isn't really a "usual" range.
That was the circumstance indeed, there were more read1s than read2s. Thank you
For the second question, the experiment is total RNAseq. A mean number of around 1 UMI/position would suggest the read coverage isn't high enough to provide any benefits compared to not using UMIs, correct? From what I have been reading, >10 UMI/position is desirable?
I wouldn't say that its really about read coverage:
Depends on the number of reads per position - if you have a large number of reads at a position, but a small number of UMIs, deduplication with UMIs is similar to deduplication without them. But I'm pretty sure that no one would recommend RNA-seq without UMIs.
If deduplication is not reducing the number of reads by very much, but you also have few UMIs per position, then you have low levels of PCR duplication, and deduplication was probably not neccessery.
One important thing to bear in mind is that its not really the "average" gene you want to be worried about in RNAseq, but rather the most highly expressed ones. As expression levels are generally log-normal distributed, the most highly expressed genes will have orders of magnitude more reads than the "avearge". These are the genes where you will see the most benefit from UMIs, as many reads will look like PCR duplicates just by chance.
The
dedup
logfile indicates:However,
Samtools flagstat
reports that:So there seems to be a 400k read discrepancy between the two, if assuming that UMI-tools is reporting paired-end read fragments. What is the explanation for such discrepancy? We couldn't figure it out.
In addition, we have observed that
Mean number of unique UMIs per position
normally ranges between 1-2 for our samples, is that the usual range?Thank you in advance