CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
491 stars 190 forks source link

whitelist is not defined error when running umi_tools extract #593

Closed prmunn closed 9 months ago

prmunn commented 1 year ago

I have an issue similar to #509 where I need to use the regex option when also using a whitelist. However, my BC pattern is XXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXXXXXXXXXXXXXXXNNNNNNNN and I'm not sure what the regex is for this (previously, I've only seen regex patterns for N's and C's). Is there a regex pattern that can also include X's, or alternatively, is the a way to pass in the pattern as a string? (currently the string option does not appear to work with a whitelist).

IanSudbery commented 1 year ago

This should have been fixed several releases ago, can you give the exact error that you have?

The regex pattern can include Xs, but how that is achevied depends on what you want to do with the XXs. Do you want to keep or discard the bases that match the Xs?

prmunn commented 1 year ago

I'm running version 1.1.2 - is it fixed in that version? I would like to keep the bases that match the X's

Here is the command I'm running and the resulting error: umi_tools extract --extract-method=string \ CCCC> -p XXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXXXXXXXXXXXXXXXNNNNNNNN \

--filtered-out=sciRNA-10K_extract_filtered_out.txt \ --filtered-out2=sciRNA-10K_extract_filtered_out2.txt \ --error-correct-cell \ --quality-filter-mask=20 \ --quality-encoding=phred33 \ --whitelist=sciRNA-10K_predictedBCwhitelist.txt \ -I sciRNA-10K_whitelist_out_R2.fastq \ -S sciRNA-10K_hBC_UMI_R2.fastq.gz \ --read2-in=sciRNA-10K_whitelist_out_R1.fastq \ --read2-out=sciRNA-10K_hBC_UMI_R1.fastq.gz \ -L sciRNA-10K_extractBC.log Traceback (most recent call last): File "/programs/UMI-tools/bin/umi_tools", line 8, in sys.exit(main()) File "/programs/UMI-tools/lib64/python3.9/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/programs/UMI-tools/lib64/python3.9/site-packages/umi_tools/extract.py", line 314, in main whitelist is None): NameError: name 'whitelist' is not defined

IanSudbery commented 1 year ago

This particular problem was fixed in 1.1.3. I recommend you update.

Any to specify the barcode in regex so as to keep the Xs you could use:

XXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXCCCCCCCCCCCCXXXXXXXXXXXXXXXXXNNNNNNNN

'^...(?P<cell_1>.{12})...(?P<cell_2>.{12})...(?P<cell_3>.{12}).{17}(?P<umi_1>.{8})'
prmunn commented 1 year ago

Thanks for your quick reply. I'll upgrade to the latest version and try the regex you suggested.