CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
491 stars 190 forks source link

umi_tools count Pysam Error #535

Closed ethanqiu04 closed 1 year ago

ethanqiu04 commented 2 years ago

Hello, I have been attempting to use the umi_tools count function to perform UMI counting per gene for a dataset with only UMIs, and no barcodes. I have been referencing the Single_cell_tutorial. Our current workflow is: add UMIs to fastqs, map with STAR, run featureCounts, then use umi_tools count.

Our error originates here:

umi_tools count --per-gene --gene-tag=XT --assigned-status-tag=XS --per-cell -I assigned_sorted.bam -S counts.tsv.gz When I get the error

Traceback (most recent call last):  
  File "/Users/ethan/opt/anaconda3/envs/umi_env/bin/umi_tools", line 8, in <module>
    sys.exit(main()). 
  File "/Users/ethan/opt/anaconda3/envs/umi_env/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
    module.main(sys.argv). 
  File "/Users/ethan/opt/anaconda3/envs/umi_env/lib/python3.8/site-packages/umi_tools/count.py", line 143, in main
    for bundle, key, status in bundle_iterator(inreads):  
  File "/Users/ethan/opt/anaconda3/envs/umi_env/lib/python3.8/site-packages/umi_tools/sam_methods.py", line 375, in __call__. 
    read.reference_name != read.next_reference_name):  
  File "pysam/calignedsegment.pyx", line 941, in pysam.calignedsegment.AlignedSegment.next_reference_name.__get__  
  File "pysam/calignmentfile.pyx", line 1653, in pysam.calignmentfile.AlignmentFile.getrname. 
  File "pysam/calignmentfile.pyx", line 678, in pysam.calignmentfile.AlignmentFile.get_reference_name
ValueError: reference_id -1 out of range 0<=tid<255

It seems to be an unmapped read issue, however, even if I attempt to remove unmapped reads from the star outputs via 'samtools view -b -F 4', the issue remains.

Any help on this error would be greatly appreciated. Our upstream processing is different from the tutorial in that we are manually adding the UMIs into our starting fastq files prior to mapping. Also, we have not discarded multi maps. Could any issues originate from this? Many thanks, Ethan Qiu

IanSudbery commented 2 years ago

I think we see this if there is are reads that are marked as mapped, but don't have a contig set. See if you can find reads that do not have the unmapped flag, but do have contig set to "*"