Closed itslittman closed 4 months ago
Hi @itslittman
The cell barcodes are assigned first, and these are then used to partition the reads, along with gene name, to reduce the search space for UMI correction and to reduce UMI collisions. I guess there could be a rescue step after UMI assignment where the rejected reads, due to no valid barcode being found, could be fished out by UMI/gene ID. It's and interesting idea.
You have a second question about correcting internal nucleotides. This could be much more easily done by generating consensus sequences for reads with the same barcode/UMI/gene and might be something that will be added to the workflow.
Closing due to lack of response
Hi @nrhorner , are you planning to implement to generate consensus sequences for reads with the same cell barcode/UMI/gene?
This feature could drastically improve my data, as >60% UMI of interest have >= 3 reads and I'm interested in SNV calling.
I'm currently trying to reproduce the implementation from sicelore https://github.com/ucagenomix/sicelore/tree/793db90c3d16fef31d8ad3f34792c595beff938a?tab=readme-ov-file#6-generate-consensus-sequences .
Please let me know if you have other suggestions how to error-correct UMI-tagged transcripts that is compatible with the epi2me workflow.
Thanks!
Is your feature related to a problem?
Some reads are inevitably thrown out due to lack of barcode, basecalling errors/below Qscore threshold, etc.
Describe the solution you'd like
If you have multiple reads with the same UMI, and some reads have low-quality bases in the cell barcode sequence, could you use the higher-quality barcode sequences from the PCR duplicates to correct the other barcode and retain the read? And could this likewise be used to correct internal nucleotide sequences?
Describe alternatives you've considered
-
Additional context
No response