bxlab / bx-python

Tools for manipulating biological data, particularly multiple sequence alignments
MIT License
145 stars 53 forks source link

binned_bitsets_from_list errors when chromosome size is larger than set MAX (512M) #67

Open zxl124 opened 4 years ago

zxl124 commented 4 years ago

I was running RSeQC inner_distance.py with a plant species which has large chromosomes (many are > 512Mb). I encountered the following error:

inner_distance.py -i in2252_13Aligned.sortedByCoord.out.bam -o in2252_13.rseqc -r ../genes.bed
Get exon regions from ../genes.bed ...
Traceback (most recent call last):
  File "/opt/conda/envs/nf-core-rnaseq-1.4.2/bin/inner_distance.py", line 95, in <module>
    main()
  File "/opt/conda/envs/nf-core-rnaseq-1.4.2/bin/inner_distance.py", line 87, in main
    obj.mRNA_inner_distance(outfile=options.output_prefix,low_bound=options.lower_bound_size,up_bound=options.upper_bound_size,step=options.step_size,refbed=options.ref_gene,sample_size=options.sampleSize, q_cut = options.map_qual)
  File "/opt/conda/envs/nf-core-rnaseq-1.4.2/lib/python3.6/site-packages/qcmodule/SAM.py", line 3582, in mRNA_inner_distance
    exon_bitsets = binned_bitsets_from_list(ref_exons)
  File "/opt/conda/envs/nf-core-rnaseq-1.4.2/lib/python3.6/site-packages/bx/bitset_builders.py", line 143, in binned_bitsets_from_list
    last_bitset.set_range( start, end - start )
  File "lib/bx/bitset.pyx", line 216, in bx.bitset.BinnedBitSet.set_range
  File "lib/bx/bitset.pyx", line 184, in bx.bitset.bb_check_range_count
  File "lib/bx/bitset.pyx", line 180, in bx.bitset.bb_check_index
IndexError: 536882486 is larger than the size of this BitSet (536870912).

After examining biset.py and bitset_builders.py, I think the error was caused by the following hard-coded limit: https://github.com/bxlab/bx-python/blob/1731099ac7e358eb2eced5a02bd4c96ee3c366f0/lib/bx/bitset.pyx#L195

I don't know the reason behind this setting. I am wondering is it possible to increase that to enable dealing with genomes with large chromosomes?

brettChapman commented 3 years ago

Hi, I'm also getting the same error as part of the nf-core/rnaseq pipeline (https://github.com/nf-core/rnaseq). I'm also working with plant genomes.

inner_distance.py \
      -i Root_1.markdup.sorted.bam \
      -r RGT_Planet.bed \
      -o Root_1 \
       \
      > stdout.txt
  head -n 2 stdout.txt > Root_1.inner_distance_mean.txt

  inner_distance.py --version | sed -e "s/inner_distance.py //g" > rseqc.version.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  Get exon regions from RGT_Planet.bed ...
  Traceback (most recent call last):
    File "/usr/local/bin/inner_distance.py", line 95, in <module>
      main()
    File "/usr/local/bin/inner_distance.py", line 87, in main
      obj.mRNA_inner_distance(outfile=options.output_prefix,low_bound=options.lower_bound_size,up_bound=options.upper_bound_size,step=options.step_size,refbed=options.ref_gene,sample_size=options.sampleSize, q_cut = options.map_qual)
    File "/usr/local/lib/python3.7/site-packages/qcmodule/SAM.py", line 3582, in mRNA_inner_distance
      exon_bitsets = binned_bitsets_from_list(ref_exons)
    File "/usr/local/lib/python3.7/site-packages/bx/bitset_builders.py", line 143, in binned_bitsets_from_list
      last_bitset.set_range( start, end - start )
    File "lib/bx/bitset.pyx", line 216, in bx.bitset.BinnedBitSet.set_range
    File "lib/bx/bitset.pyx", line 184, in bx.bitset.bb_check_range_count
    File "lib/bx/bitset.pyx", line 180, in bx.bitset.bb_check_index
  IndexError: 537013395 is larger than the size of this BitSet (536870912).