broadinstitute / StrainGE

strain-level analysis tools
BSD 3-Clause "New" or "Revised" License
33 stars 10 forks source link

Error due to five indices for possible alleles, rather than six #33

Closed gavinmdouglas closed 1 year ago

gavinmdouglas commented 1 year ago

Hi there,

I am running into the below error for this straingr call command, with StrainGE version 1.3.7.

straingr call refs_concat.fasta sample1.bam --hdf5-out sample1.hdf5 --summary sample1.tsv --tracks all

Error:

Traceback (most recent call last):
  File "/home/gdouglas/local/miniconda3/envs/strainge/bin/straingr", line 8, in <module>
    sys.exit(straingr_cli())
             ^^^^^^^^^^^^^^
  File "/home/gdouglas/local/miniconda3/envs/strainge/lib/python3.11/site-packages/strainge/cli/main.py", line 110, in __call__
    self.run(args)
  File "/home/gdouglas/local/miniconda3/envs/strainge/lib/python3.11/site-packages/strainge/cli/registry.py", line 83, in run
    rc = subcommand_func(**args_dict)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gdouglas/local/miniconda3/envs/strainge/lib/python3.11/site-packages/strainge/cli/straingr.py", line 468, in __call__
    call_data = caller.process(reference, sample_bam)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gdouglas/local/miniconda3/envs/strainge/lib/python3.11/site-packages/strainge/variant_caller.py", line 992, in process
    self._assess_allele(call_data, scaffold, refpos, read)
  File "/home/gdouglas/local/miniconda3/envs/strainge/lib/python3.11/site-packages/strainge/variant_caller.py", line 1102, in _assess_allele
    call_data.good_read(scaffold, refpos, base, qual, mq, False)
  File "/home/gdouglas/local/miniconda3/envs/strainge/lib/python3.11/site-packages/strainge/variant_caller.py", line 542, in good_read
    scaffold_data.alleles[pos, 0, ix] += 1
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
IndexError: index 5 is out of bounds for axis 2 with size 5

After some troubleshooting, I realized that this is because DEL mutations should be given an index of 5 when incrementing scaffold_data.alleles, and that this can be fixed by adjusting L765 of variant_caller.py.

Original line:

self.alleles = numpy.zeros((self.length, 2, len(Allele)-1),
                                   dtype=numpy.uint32)

Changed line (len(Allele)-1 replaced by 6):

self.alleles = numpy.zeros((self.length, 2, 6),
                                   dtype=numpy.uint32)

I'm not sure if this is due to issues with certain scaffolds (e.g., if a scaffold doesn't contain any Ns maybe it would have a smaller length of Allele? I'm not sure, as it seems like this is a characteristic of the class definition. But in any case, this change appears to fix the problem and produce the correct output, from what I can tell, but it would be great to have it confirmed that this should not negatively affect the results.

Thanks!

Gavin

liatbi commented 1 year ago

Dear Gavin I ran into the same problem. Thank you for this troubleshooting tip Liat

lrvdijk commented 1 year ago

Fixed! New release should be on PyPI soon.