CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
481 stars 190 forks source link

umitools dedup error: '<' not supported between instances of 'str' and 'bytes' #533

Closed camelest closed 2 years ago

camelest commented 2 years ago

Hi, thank you so much for offering this wonderful tool.

I'm trying to deduplicate single-cell RNA-seq bam files mapped by STAR. By reading this thread, I want to compare the results of --umi-tag=UR --cell-tag=CR and --umi-tag=UB --cell-tag=CB --method=unique

The former works fine but the latter somehow gives the error as: Traceback (most recent call last): File "/home/anaconda3/bin/umi_tools", line 11, in <module> sys.exit(main()) File "/home/anaconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/home/anaconda3/lib/python3.8/site-packages/umi_tools/dedup.py", line 309, in main for bundle, key, status in bundle_iterator(inreads): File "/home/anaconda3/lib/python3.8/site-packages/umi_tools/sam_methods.py", line 488, in __call__ for k in sorted(self.reads_dict[p].keys()): TypeError: '<' not supported between instances of 'str' and 'bytes'

even if the rest of the codes are identical. I'm using umitools v.1.1.2 and the exact codes are as follows: umi_tools dedup --per-cell -I Aligned.sortedByCoord.out.bam --extract-umi-method=tag --umi-tag=UB --cell-tag=CB --method=unique -S Aligned.sortedByCoord.out_deduplicated_UB_CB.bam

umi_tools dedup --per-cell -I Aligned.sortedByCoord.out.bam --extract-umi-method=tag --umi-tag=UR --cell-tag=CR -S Aligned.sortedByCoord.out_deduplicated_UR_CR.bam

Do you have any ideas where I missed? Thank you so much for your help.

Best, Raku

vivekbhr commented 2 years ago

I've been having the same error lately on my STAR-mapped output with both current (1.1.2) and older (1.0.0) umi_tools version

TomSmithCGAT commented 2 years ago

Sorry, missed this message originally. I suspect the CB tags may be formatted in an unexpected way. Could you please post a few lines from the BAM file, or even just a list of the first few CB tags.

vivekbhr commented 2 years ago

Yes, indeed it turned out in my case it was due to STAR attaching the CB:Z:- for the reads with no/mismatched barcodes. I filtered the files against that and it seems to be working now :+1: Thanks

TomSmithCGAT commented 2 years ago

Ah, brilliant. @camelest, does that solve your error too?

camelest commented 2 years ago

@vivekbhr @TomSmithCGAT I'm so sorry for my late response. Somehow I did not receive the notice. It worked perfect once I filtered out the reads with CB:Z:-! Thank you so much for your help and I'm closing the issue.