Closed robinycfang closed 8 months ago
There are two things going on here: First you are passing a whitelist of Cell barcodes, not UMIs to be filtered, and this is causing an error because your barcode does not contain cell barcodes. Unfortunately the code that catches this error has itself an error in it! (This has now been fixed on the master branch).
If you wish to use a predetermined list of UMIs, then you should use the options --filter-umi --filter-umi-whitelist=umi_list.txt
instead of --whitelist
.
Hi Folks,
I have a similar problem. but when I try @IanSudbery method, it seems --filter-umi-whitelist is not existed as an option.
(scStarrseq) [tmorova@linuxsrv006 use-alevin]$ umi_tools extract -I NL2_CKDL210021281-1a-SI_GA_A2_HMHFJDSX2_S3_L004_R1_001.fastq.gz --read2-in=NL2_CKDL210021281-1a-SI_GA_A2_HMHFJDSX2_S3_L004_R2_001.fastq.gz --stdout=umitools/processed.1.fastq.gz --read2-out=umitools/processed.2.fastq.gz --log2stderr --filter-umi --filter-umi-whitelist=umitools/10x-whitelist.txt --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNNNN
extract - Extract UMI from fastq
Usage:
Single-end:
umi_tools extract [OPTIONS] -p PATTERN [-I IN_FASTQ[.gz]] [-S OUT_FASTQ[.gz]]
Paired end:
umi_tools extract [OPTIONS] -p PATTERN [-I IN_FASTQ[.gz]] [-S OUT_FASTQ[.gz]] --read2-in=IN2_FASTQ[.gz] --read2-out=OUT2_FASTQ[.gz]
note: If -I/-S are ommited standard in and standard out are used
for input and output. To generate a valid BAM file on
standard out, please redirect log with --log=LOGFILE or
--log2stderr. Input/Output will be (de)compressed if a
filename provided to -S/-I/--read2-in/read2-out ends in .gz
For full UMI-tools documentation, see https://umi-tools.readthedocs.io/en/latest/
extract: error: no such option: --filter-umi-whitelist
here is my umit_tools version
(scStarrseq) [tmorova@linuxsrv006 use-alevin]$ umi_tools --version
UMI-tools version: 1.1.1
Thank you for the help,
Best regards,
Tunc.
Hi, sorry, my bad, the option is --umi-whitelist
not --filter-umi-whitelist
. For some reason these options have been hidden from the help. I'm not sure why, but it probably means this function has not been thoroughly tested and should be regarded as experimental.
@TomSmithCGAT do you remember why these options are hidden?
Yes, exactly that. I added it for a project where I was working with a library prep kit that included 96 pre-determined UMIs - Can't remember the kit name now. While it should be working absolutely fine, It's not been thoroughly tested.
Hi guys, following up on this, a quick question that you may be able to address:
I am running extract
with the --whitelist
option with a list of 300 whitelisted cell barcodes (only one column, no error correction). When I compare the number of lines before and after running extract
, they are unchanged, I expected this option to only retain reads when they are in the whitelist file? Is there an easy explanation, did I misunderstand something, or should I check further?
@chrarnold, that's correct. Only reads with whitelisted cells should be retained. Without error correction, they would need to have the exactly correct cell barcode. Given sequencing errors, this is unlikely to be the case, so I would expect some reads to be filtered.
Could you please post an example read and the umi_tools command used.
Can you do a quick sanity check and provide a random whitelist and confirm that all reads are filtered out.
I'm closing due to inactivity
Hi,
I have paired end bulk sequencing data with UMIs (3 or 4 mers with a T). I have a list of ture UMI sequence. When I tried to extract UMIs from reads against the white list, I got the following error. However, when I got rid of the white list parameter, the error went away, but this extraction wouldn't be accurate. Any help would be really appreciated, thanks!
umi_tools extract --extract-method=regex --bc-pattern="(?P<umi_1>^[ACGT]{3}[ACG])(?P<discard_1>T)|(?P<umi_2>^[ACGT]{3})(?P<discard_2>T)" --bc-pattern2="(?P<umi_1>^[ACGT]{3}[ACG])(?P<discard_1>T)|(?P<umi_2>^[ACGT]{3})(?P<discard_2>T)" --whitelist=umi_list.txt -I sample_1.fq.gz --read2-in=sample_2.fq.gz --stdout=processed.1.fastq.gz --read2-out=processed.2.fastq.gz --log=processed.log
error with umi-tools:Traceback (most recent call last): File "/centos7/umi_tools/1.1.1/bin/umi_tools", line 10, in <module> sys.exit(main()) File "/centos7/umi_tools/1.1.1/lib/python3.7/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/centos7/umi_tools/1.1.1/lib/python3.7/site-packages/umi_tools/extract.py", line 369, in main options.pattern, options.pattern2)) TypeError: 'str' object is not callable
the same with 1.1.2:Traceback (most recent call last): File "/miniconda3/bin/umi_tools", line 8, in <module> sys.exit(main()) File "/miniconda3/lib/python3.9/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/miniconda3/lib/python3.9/site-packages/umi_tools/extract.py", line 367, in main U.error("barcode regex(es) do not include any cell groups " TypeError: 'str' object is not callable