Microbial-Ecology-Group / AMRplusplus

AMR++ is a bioinformatic pipeline meant to aid in the analysis of raw sequencing reads to characterize the profile of antimicrobial resistance genes, or resistome.
https://www.meglab.org/
GNU General Public License v3.0
25 stars 8 forks source link

TypeError with certain samples #34

Open nsharp2 opened 4 months ago

nsharp2 commented 4 months ago

Hello,

I have been encountering an issue recently running AMR++ SNP verification with metagenomic data. A sample stopped execution at the runsnp process with the following message:

Command error:
  concurrent.futures.process._RemoteTraceback:
  """
  Traceback (most recent call last):
    File "/usr/local/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
      r = call_item.fn(*call_item.args, **call_item.kwargs)
    File "/usr/local/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk
      return [fn(*args) for args in chunk]
    File "/usr/local/lib/python3.8/concurrent/futures/process.py", line 198, in <listcomp>
      return [fn(*args) for args in chunk]
    File "/opt/analysis/nextflow-bin/SNP_Verification.py", line 229, in iterate
      verify(read, gene_variant, config)
    File "/opt/analysis/nextflow-bin/SNP_Verification_Processes/__init__.py", line 49, in verify
      nTupleCheck(read, gene, mapOfInterest, seqOfInterest, config)
    File "/opt/analysis/nextflow-bin/SNP_Verification_Processes/nTupleCheck.py", line 171, in nTupleCheck
      if mt == seqOfInterest[queryIndex]:
  TypeError: string indices must be integers
  """
  The above exception was the direct cause of the following exception:
  Traceback (most recent call last):
    File "/opt/analysis/nextflow-bin/SNP_Verification.py", line 353, in <module>
      main()
    File "/opt/analysis/nextflow-bin/SNP_Verification.py", line 348, in main
      [gene_variant_dict.update(r) for r in results]
    File "/opt/analysis/nextflow-bin/SNP_Verification.py", line 348, in <listcomp>
      [gene_variant_dict.update(r) for r in results]
    File "/usr/local/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
      for element in iterable:
    File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
      yield fs.pop().result()
    File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
      return self.__get_result()
    File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
      raise self._exception
  TypeError: string indices must be integers

Upon closer inspection of SNP_Verification.py, I have found that this particular error seems to occur when queryIndex at this line is set to None. I set some print statements for mapofInterest, seqOfInterest, and queryIndex in the loop this line is included in and took a screenshot of the output. I found that this loop is looped for a large number of times (over 265 times) for a specific sequence, before the value 11 for key 2251 in mapOfInterest below is set to None:

amr++_behavior

I am not quite sure why this behavior is happening, and would greatly appreciate any help with the matter.

Thank you!