CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
481 stars 190 forks source link

bc-pattern2: TypeError: object of type 'NoneType' has no len() #522

Closed m3hdad closed 6 months ago

m3hdad commented 2 years ago

Hi there,

Shouldn't in the following if(len(read1.seq) < len(self.pattern)) be as if(len(read1.seq) < len(self.pattern2)) or something to avoid this.

This is fixed if I pass --bc-pattern=X

UMI on read2:

read1:

@A1000 1:N
CGCTGGTGGCTGGCCGCTTTGGCCTGGCACCCACCTCCACCCCCCACACCAACCCCGGCCAGAAGCTGCTGCCAACTGACAAGTCTGCTGGCCTGTACAGCGGCGACCCTGCTGGCTTCAACGCCGTCGATGTGCTGGCACTTGGCGCCC
+
FFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFF:FF,FF::FF,FFFF:FF,FFFFF,FFF,FFF,F:FFFFFFFFFFFFFF,,FF:FFFFFFFFFFFF:FFFFFFFFFF:FFFFFFFFF:FF:FF,FFFFFF,FFFFFF:F:FFFFFFFF
@A1047 1:N
GGCTGCACTGCAACAAAAGGGCTGGGCATACGAAGAAGATGTGGGTGGCGGAGCATTTTATGGTCCCAAGATTGACATCAAGATTTGCGATGCCATAGGCAGGAAATGGCAGTGCTTAACAGTGCAGCTGGATTTCAACCTGCCAGAACG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFF:FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF,FFF:FFFFF:FF::F::FFF::FFFFFFFFF::F

read2

@A1000 2:N
AACAACAAGCGGCAGGTGGCAGGGTTGACTGAGGATGTTCTTCTCGGGCAGTACATGGGTCAAGAGCGACCGACAGGGGCTGACGGACGCTAAGATCAGGCCAGGTTGCCGGTGGCCTTGAGGCCCAGCACCAGGCCGACGCCGATGATGTGTCCAAG
+
FFFFFFFFFFFFFFFF:FF:FFFFFFFFFFFFFFFFF:F:FFFFFFFF:FFFFFFFFF:FFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFF,FFFFFFFFFFFF:
@A1047 2:N
CCCGCATCACGGGGGCCAGCCATAGGGGGAAGGCACCTGCGTAGTTTTCTATCAAAATGCCCATAAACCTCTCCAAAGAACCAAGAATTGCACGATGGATCATGATGGGTCGCTCTCTGACATTGGCATCGCTGATGTAAAACATGTCGAAACGTTCT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFF:FF:::FFFFFFFFFFF:FFFFFFFFFFFFFFFFF:F:FF:F:FF

I have the following TypeError:

# UMI-tools version: 1.1.2
# output generated by extract -I read1.fastq.gz --read2-in=read2.fastq.gz --bc-pattern2=NNNNNNNN -S r1UMI.fastq.gz --read2-out=r2UMI.fastq.gz

# blacklist                               : None
# compresslevel                           : 6
# correct_umi_threshold                   : 0
# either_read                             : False
# either_read_resolve                     : discard
# error_correct_cell                      : False
# extract_method                          : string
# filter_cell_barcode                     : None
# filter_cell_barcodes                    : False
# filter_umi                              : None
# filtered_out                            : None
# filtered_out2                           : None
# ignore_suffix                           : False
# log2stderr                              : False
# loglevel                                : 1
# pattern                                 : None
# pattern2                                : NNNNNNNN
# prime3                                  : None
# quality_encoding                        : None
# quality_filter_mask                     : None
# quality_filter_threshold                : None
# random_seed                             : None
# read2_in                                : read2.fastq.gz
# read2_out                               : r2UMI.fastq.gz
# read2_stdout                            : False
# reads_subset                            : None
# reconcile                               : False
# retain_umi                              : None
# short_help                              : None
# stderr                                  : <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# stdin                                   : <_io.TextIOWrapper name='read1.fastq.gz' encoding='ascii'>
# stdlog                                  : <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
# stdout                                  : <_io.TextIOWrapper name='r1UMI.fastq.gz' encoding='ascii'>
# timeit_file                             : None
# timeit_header                           : None
# timeit_name                             : all
# tmpdir                                  : None
# umi_correct_log                         : None
# umi_whitelist                           : None
# umi_whitelist_paired                    : None
# whitelist                               : None
2022-03-28 11:57:57,858 INFO Starting barcode extraction
Traceback (most recent call last):
  File "/home/xxx/progs/miniconda3/envs/xxx/bin/umi_tools", line 11, in <module>
    sys.exit(main())
  File "/home/xxx/progs/miniconda3/envs/xxx/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
    module.main(sys.argv)
  File "/home/xxx/progs/miniconda3/envs/xxx/lib/python3.8/site-packages/umi_tools/extract.py", line 477, in main
    reads = ReadExtractor(read1, read2)
  File "/home/xxx/progs/miniconda3/envs/xxx/lib/python3.8/site-packages/umi_tools/extract_methods.py", line 543, in __call__
    if(len(read1.seq) < len(self.pattern)):
TypeError: object of type 'NoneType' has no len()
TomSmithCGAT commented 2 years ago

Just to check, you're trying to extract a UMI from just read2? extract is designed to extract from read1 or read1 and read2, but it's possible to extract from just read2. Just swap them around in the input/output like so:

umi_tools extract -I read2.fastq.gz --read2-in=read1.fastq.gz --bc-pattern=NNNNNNNN -S r2UMI.fastq.gz --read2-out=r1UMI.fastq.gz
m3hdad commented 2 years ago

Yes I was trying to extract from read2 only. The reason is that umi-tools is part of a pipeline which I do not want to change the order of reads or its logic will break down.

Anyway I know the swapping works fine. It was my misunderstanding about --bc-pattern2 function to extract only from read2. Well in the documentation it says --bc-pattern and/or --bc-pattern2 are required.

You might consider this as a feature request if it's not too much work to cater for. As I mentioned passing --bc-pattern=X escapes the TypeError. Or the issue is closed as it is how it was suppose to work. Thanks for your reply.

TomSmithCGAT commented 2 years ago

Ah, the docs are wrong in that case!

I'd have to take a look back into the code to see whether supporting just extraction from read2 is a PITA. I suspect it should be OK and I've got no principle against it.

You OK if we add this @IanSudbery?

IanSudbery commented 2 years ago

Fine by me, as long is its not too big a surgery.

c-guzman commented 1 year ago

Just posting here as another instance where this isn't explicitly clear in the documentation and caused issues until I found this post. Would be a great feature to add.

Thanks!

TomSmithCGAT commented 6 months ago

The next release will include an option to extract barcodes from read 2 only (see #630)