CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
481 stars 190 forks source link

Is a whitelist necessary for 'umi_tools extract --extract-method=regex' #532

Closed wanxinw closed 6 months ago

wanxinw commented 2 years ago

Hi UMI-tools team,

I hope to run the following command:

umi_tools extract --stdin in.fastq.gz --filtered-out --extract-method=regex --bc-pattern='(?P.{8,12})(?PGAGTGATTGCTTGTGACGCCTT)(?P.{8})(?P.{6})T{3}.*'

However i got the following error:

Traceback (most recent call last): File "/home/atlasbio/.local/bin/umi_tools", line 33, in sys.exit(load_entry_point('umi-tools==1.1.2', 'console_scripts', 'umi_tools')()) File "/home/atlasbio/.local/lib/python3.9/site-packages/umi_tools-1.1.2-py3.9-linux-x86_64.egg/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/home/atlasbio/.local/lib/python3.9/site-packages/umi_tools-1.1.2-py3.9-linux-x86_64.egg/umi_tools/extract.py", line 314, in main whitelist is None): NameError: name 'whitelist' is not defined

Our hope is that we can extract the barcodes without having to specify a whitelist. Is it possible to run the command without a whitelist?

Thanks!

IanSudbery commented 2 years ago

Duplicate of #509

IanSudbery commented 2 years ago

Hi, Wanxinw,

No you don't need a whitelist! We've seen this problem before, and it has been fixed on the master branch, but is not yet in the release version.

You can fix this by downloading and installing the lastest commit on the master branch:

$ wget https://github.com/CGATOxford/UMI-tools/archive/refs/heads/master.zip
$ unzip master.zip
$ cd UMI-tools_master
$ python setup.py install