Closed TomSmithCGAT closed 1 year ago
Extract can only 'correct' cell barcode errors if you supply them within the whitelist file and use the --error-correct-cell
option. You either need to use umi-tools whitelist
to generate the whitelist, which will by default identify cell barcodes with 1 substitution as errors to be corrected, or else manually generate the file with the list of all possible errors to be corrected.
From docs for extract: https://umi-tools.readthedocs.io/en/latest/reference/extract.html#whitelist
--whitelist
Whitelist of accepted cell barcodes. The whitelist should be in the following format (tab-separated):
AAAAAA AGAAAA
AAAATC
AAACAT
AAACTA AAACTN,GAACTA
AAATAC
AAATCA GAATCA
AAATGT AAAGGT,CAATGT
Where column 1 is the whitelisted cell barcodes and column 2 is the list (comma-separated) of other cell barcodes which should be corrected to the barcode in column 1.
Another related question: Now I can use my customized whitelist. But my barcodes is specifically designed and there are at least 2bp differs among the barcodes, so I want to allow 1bp mismatch to the barcodes in the whitelist, how to do it ?
Originally posted by @wangjiawen2013 in https://github.com/CGATOxford/UMI-tools/issues/525#issuecomment-1099806090