CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
491 stars 190 forks source link

Error with sam input for dedup #483

Closed bernt-matthias closed 1 year ago

bernt-matthias commented 3 years ago
Traceback (most recent call last):
  File "/home/berntm/miniconda3/envs/mulled-v1-2a7f10a44f18642cf5002ac84c4fb15b0fb54641efde75615e7365dedf6dd7b3/bin/umi_tools", line 11, in <module>
    sys.exit(main())
  File "/home/berntm/miniconda3/envs/mulled-v1-2a7f10a44f18642cf5002ac84c4fb15b0fb54641efde75615e7365dedf6dd7b3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
    module.main(sys.argv)
  File "/home/berntm/miniconda3/envs/mulled-v1-2a7f10a44f18642cf5002ac84c4fb15b0fb54641efde75615e7365dedf6dd7b3/lib/python3.8/site-packages/umi_tools/dedup.py", line 338, in main
    outfile.write(read)
  File "/home/berntm/miniconda3/envs/mulled-v1-2a7f10a44f18642cf5002ac84c4fb15b0fb54641efde75615e7365dedf6dd7b3/lib/python3.8/site-packages/umi_tools/sam_methods.py", line 589, in write
    self.write_mates()
  File "/home/berntm/miniconda3/envs/mulled-v1-2a7f10a44f18642cf5002ac84c4fb15b0fb54641efde75615e7365dedf6dd7b3/lib/python3.8/site-packages/umi_tools/sam_methods.py", line 604, in write_mates
    for read in self.infile.fetch(reference=self.chrom, multiple_iterators=True):
  File "pysam/libcalignmentfile.pyx", line 1117, in pysam.libcalignmentfile.AlignmentFile.fetch
ValueError: multiple iterators not implemented for SAM files

I did set the --in-sam flag and it also seems that this is recognized by the tool, i.e. I see in_sam : True in the output.

I'm also wondering if the SAM/BAM input needs to be sorted for count, dedup and group?

IanSudbery commented 3 years ago

Sorry, we don't support SAM input for paired data because the file needs to be accessed in a random access fashion. We need to add a check for this!

Also, yes, input needs to be co-ordinate sorted for count, dedup and group.