broadinstitute / StrainGE

strain-level analysis tools
BSD 3-Clause "New" or "Revised" License
33 stars 10 forks source link

cannot convert float NaN to integer #31

Open nick-youngblut opened 1 year ago

nick-youngblut commented 1 year ago

Command:

straingst run --iterations 5 \
    --minfrac 0.01 --separate-output \
    -o NTC_taxid-1472416 \
    1472416.hdf5 NTC.hdf5

Error:

Command error:
  2023-04-10 23:45:34,474 - INFO:root:Running StrainGST on sample NTC.hdf5 with database 1472416.hdf5
  /opt/conda/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3464: RuntimeWarning: Mean of empty slice.
    return _methods._mean(a, axis=axis, dtype=dtype,
  /opt/conda/lib/python3.10/site-packages/numpy/core/_methods.py:192: RuntimeWarning: invalid value encountered in scalar divide
    ret = ret.dtype.type(ret / rcount)
  Traceback (most recent call last):
    File "/opt/conda/bin/straingst", line 11, in <module>
      sys.exit(straingst_cli())
    File "/opt/conda/lib/python3.10/site-packages/strainge/cli/main.py", line 110, in __call__
      self.run(args)
    File "/opt/conda/lib/python3.10/site-packages/strainge/cli/registry.py", line 83, in run
      rc = subcommand_func(**args_dict)
    File "/opt/conda/lib/python3.10/site-packages/strainge/cli/straingst.py", line 160, in __call__
      results = straingst.find_close_references(sample_kmerset,
    File "/opt/conda/lib/python3.10/site-packages/strainge/search_tool.py", line 216, in find_close_references
      universal_limit = int(np.mean(sample.counts) * self.universal)
  ValueError: cannot convert float NaN to integer

Conda env: bioconda::strainge=1.3.3

This appears to be an edge case that is not accounted for. I'm running 100's of straingst run jobs, and this is the only one that failed.

lrvdijk commented 1 year ago

Hmm could it be that the k-mer set of that sample is empty? A better error would be nicer indeed.

cusoiv commented 4 months ago

I seem to have run into exactly the same issue -- I am running 1.3.9 and I have run into this problem with both mock metagenomes that consist of two mixed strains or real metagenomes. I am sure that there is nothing wrong with the samples because I was able to successfully run the same samples/database with an earlier version of strainge 3 years ago...