Closed MEFarhadieh closed 2 years ago
I have the same question !
You can just provide your own file to the --whitelist
parameter of extract
. The format of the file is described here:
https://umi-tools.readthedocs.io/en/latest/reference/extract.html#whitelist
Thank you so much!
I thought it had to be 4 tab-separated columns format.
I used one column excel, the barcodes were in the first column and there were not other columns. Then umi-tools didn't ran correctly, and the outputs are redirect to the whitelist, the resulting fastq file was empty. Then I filled the fourth column with all "1" and leave the second and third columns empty. Then it worked.
Another related question: Now I can use my customized whitelist. But my barcodes is specifically designed and there are at least 2bp differs among the barcodes, so I want to allow 1bp mismatch to the barcodes in the whitelist, how to do it ?
@wangjiawen2013 - I've address the question above in a separate issue #529
@wangjiawen2013
I am a bit confused about what is happening here:
Then umi-tools didn't ran correctly, and the outputs are redirect to the whitelist, the resulting fastq file was empty.
When you say " the outputs are redirect to the whitelist", do you mean that your whitelist file was overwritten?
Can you show me the complete command you used?
umi_tools extract --stdin in.fq --stdout out.fq.gz --extract-method=regex \
--bc-pattern='^(?P
Besides, umitools cannot work if the there is a "/"(or "\", I forget wich one) in fastq names. such as: @abcdejfkalfjl:sjfkd \c 123456 ATGCATGCATGCATGC....... ! IC?CCCIIIIIIIII................ In this case, the barcode and umi will be extracted successfully, but will be discarded when counting. Then all the counts are zero.
Your problem with the whitelist is very peculiar, and I can't work out how that would be possible. I'm guessing the problem with \
in the read name will be because this is often used to denote read1 or read2 in a read name:
@abcdejfkalfjl:sjfkd /1
ATGCATGCATGCATGC.......
!
IC?CCCIIIIIIIII................
@abcdejfkalfjl:sjfkd /2
ATGCATGCATGCATGC.......
!
IC?CCCIIIIIIIII................
Although I can't quite see how that would lead to the reads being dropped.
Yeah, doesn't make sense to me either. Extract drops the read pairs that don't match. which can be a problem, as per https://github.com/CGATOxford/UMI-tools/issues/325 and easily solved. But @wangjiawen2013 is saying the extract step is OK. So not sure why the issue is cropping up
Thanks for this great tool!
I have the list of final barcodes for my interest cell types. How can I use this list instead of UMI-whitelist.txt as input of extraction step? And how this txt file should be formatted?