CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
481 stars 190 forks source link

filtered-out option error #509

Closed camelest closed 2 years ago

camelest commented 2 years ago

Hi, thank you for this wonderful tool.

I'm trying to use the --filtered-out option on umi_tools but have the errors below (both in v1.1.1 and v1.1.2).

umi_tools extract -I R1.fastq.gz \ --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \ --read2-in=R2.fastq.gz \ --stdout=process.R1.fastq.gz \ --read2-out=process.R2.fastq.gz \ --filter-cell-barcode --whitelist=CB.txt \ -L umi_tools.log \ --filtered-out=filtered-out.fastq.gz \ --filtered-out2=filtered-out.2.fastq.gz

Traceback (most recent call last):
 File “/local/home/ubuntu/anaconda3/bin/umi_tools”, line 11, in <module>
  sys.exit(main())
 File “/local/home/ubuntu/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py”, line 61, in main
  module.main(sys.argv)
 File “/local/home/ubuntu/anaconda3/lib/python3.8/site-packages/umi_tools/extract.py”, line 314, in main
  whitelist is None):
NameError: name ‘whitelist’ is not defined

I'm including the whitelist (of cell barcodes) but the error still says it's not defined. Also the script works fine if I don't add the filtered-out and filtered-out2 options. Did I miss something?

Thank you so much for your help.

Best,

IanSudbery commented 2 years ago

This is an error the option checking code, sorry. I've submitted a fix, but in the mean time you can avoid it by specifying:

--extract_method=regex --bc-pattern='(?:P<cell_1>.{16})(?:P<umi_1>.{10})'

camelest commented 2 years ago

Thank you so much for your kind and quick response.

I just tried your suggestion but it still gives errors (seems to be a different one). umi_tools extract -I R1.fastq.gz \ --extract-method=regex \ --bc-pattern='(?:P<cell_1>.{16})(?:P<umi_1>.{10})' \ --read2-in=R2.fastq.gz \ --stdout=process.R1.fastq.gz \ --read2-out=process.R2.fastq.gz \ --whitelist=CB.txt \ -L umi_tools.log \ --filtered-out=filtered-out.fastq.gz \ --filtered-out2=filtered-out.2.fastq.gz

In v1.1.1, Traceback (most recent call last): File "/local/home/ubuntu/anaconda3/bin/umi_tools", line 11, in <module> sys.exit(main()) File "/local/home/ubuntu/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/local/home/ubuntu/anaconda3/lib/python3.8/site-packages/umi_tools/extract.py", line 326, in main extract_cell, extract_umi = U.validateExtractOptions(options) File "/local/home/ubuntu/anaconda3/lib/python3.8/site-packages/umi_tools/Utilities.py", line 1177, in validateExtractOptions raise ValueError("barcode regex(es) do not include any umi groups " TypeError: 'str' object is not callable

In v1.1.2, the last line of error seems to be changed to ValueError: barcode regex(es) do not include any umi groups (starting with 'umi_') regex.Regex('(?:P<cell_1>.{16})(?:P<umi_1>.{10})', flags=regex.V0), None

And this time, errors persisted even if I removed the filtered-out and filtered-out2 options. So sorry for bothering you.

Best,

IanSudbery commented 2 years ago

Sorry, that regex should be:

--bc-pattern='(?P<cell_1>.{16})(?P<umi_1>.{10})'

camelest commented 2 years ago

Oh, sorry, now it worked perfectly. Thank you so much for your help!

Best,

Ermela1 commented 2 years ago

Hello,

I am having the same issue (error message). I do not have a white list and I did use a regex. The script runs fine without the --filtered-out option. I was wondering if there is a fix for it or a workaround.This is my command

umi_tools extract --stdin=Data/reads/test2.fastq --log=analysis/Logs/umi/test2.log --filtered-out=analysis/Logs/umi/test2.filteredout.fastq --stdout=analysis/UMI_reads/test2.fastq --extract-method=regex --bc-pattern='.+(?PAACTGTAGGCACCATCAAT|GTTCAGAGTTCTACAGTCCGACGATC){s<=3}(?P.{12})(?P.+)' This is the error I am getting Traceback (most recent call last): File "/home/ermela/anaconda3/bin/umi_tools", line 33, in sys.exit(load_entry_point('umi-tools==1.1.2', 'console_scripts', 'umi_tools')()) File "/home/ermela/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/home/ermela/anaconda3/lib/python3.8/site-packages/umi_tools/extract.py", line 446, in main filtered_out.write(str(read1) + "\n") UnboundLocalError: local variable 'read1' referenced before assignment

Thank you for your time, Ermela

IanSudbery commented 2 years ago

Hi Ermela,

This is actaully a different issue, the same as was seen in #453. This should be fixed in the latest version, so you know which version you are using?

Ermela1 commented 2 years ago

Thank you for the quick response. I think I have the latest version ? commit 5c2dd0fd208df3a8f93399c99b1d164aef8094be

Thank you, Ermela

IanSudbery commented 2 years ago

The error message in your post above says that line 446 is: filtered_out.write(str(read1) + "\n")

however, line 446 in the latest commit is: https://github.com/CGATOxford/UMI-tools/blob/5c2dd0fd208df3a8f93399c99b1d164aef8094be/umi_tools/extract.py#L446

note that your error says read1 and the current code says read.

Is it possible that you have downloaded the latest commit, but thats not the one that is actually being run? What does umi_tools --version return?

Ermela1 commented 2 years ago

You're absolutely right! The -- version gives me UMI-tools version: 1.1.1. I will try to figure out how to change this. Thank you so much for your help!

IanSudbery commented 2 years ago

I'm going to close this for now. Feel free to reopen if necessary.