Closed wangjiawen2013 closed 2 years ago
regex matches read1 = Number of reads where the regex supplied to identify the cell barcode and umi from the read matches the read sequence.
In your case, the regex matches every single input read, which suggests to me you might not need to use a regex at all and a quicker string pattern may suffice. What was the regex you used?
Reads output = The number of reads output from extract. For the read to be output, it needed to have a cell barcode in the whitelist, hence reads output is lower than input
Filtered cell barcode = The number of reads which were filtered (e.g not output) because they did not match the cell barcode whitelist. Reads output + Filtered cell barcode = regex matches read1
umi_tools extract --stdin in.fq --stdout out.fq --extract-method=regex \
--bc-pattern='^(?P
My fastq structure: 8bp(barcode1)+8bp(barcode2)+4bp(umi1)+40bp(target sequence)+4bp(umi2)+others
Ah, if you need to discard bases after umi2
, you will need to use a regex after all. Currently the string extraction method doesn't support discarding bases.
Did the above explanations all make sense?
Yes, thank you!
Hi This is my output of umi-tools extract:
The input reads eqaul "regex matches read1", while "Reads output" are less than them and "Filtered cell barcode" is less than "Reads output", could you explain the differecne among "regex matches read1", "Reads output" and "Filtered cell barcode" ?