Closed skudashev closed 3 months ago
Hi Sofia
The UMIs are collapsed for the expression matrix creation only. This is done by grouping reads by cell barcode and gene/transcript and getting the unique counts UMI counts.
The tagged BAMs are not subjected to deduplication. All reads assigned barcode and UMI are output into there.
You last comment seems to be that there is an issue when the corrected UMI sequence is les than the expected 12nt. At what sort of frequency do you see this happing?
Hello,
Thank you for your explanation. That makes sense, so the quantification is done by using the transcript and UMI assignments, rather than collapsing UMIs pre mapping to transcript. <12nt UMIs are not a big issue as these short UMI reads make up only 0.005% of the total UMI tagged reads.
Best, Sofia
Ask away!
Hello, I see that you use UMI-tools for UMI correction, but how do you collapse UMIs for quantification? Do you just randomly pick 1 read per UMI or do you use a process similar to UMI-tools which selects the read with best mapping score? I have been trying to use the tagged BAM generated by your pipeline to do transcript quantification with a different tool but the only way to use
umi_tools dedup
is if I remove all the reads where the corrected barcode (UB) < 12nt. Kind regards, Sofia