Cue / scales

scales - Metrics for Python
Apache License 2.0
920 stars 73 forks source link

ZeroDivisionError in PmfStatDict.addValue() #32

Closed golfdish closed 9 years ago

golfdish commented 10 years ago

Calculating self.__sample.stddev for a PmfStatDict after calling addValue results in a ZeroDivisionError when the list of samples has 1 element but its count is 2 or greater, as when an operation takes zero time (e.g. when unit testing with time.time() patched out). This is due to these lines in ExponentiallyDecayingReservoir.update (samplestats.py:151):

    priority = self.__weight(timestamp - self.startTime) / random.random()

    self.count += 1
    if (self.count <= self.size):
      self.values[priority] = value

priority is obviously 0 when timestamp - self.startTime is 0, thus self.samples() returns a list of length 1 (self.values.values()) while self.count is 2 or greater. Because self.count decides len(self) for a Sampler, the test at the top of

  @property
  def stddev(self):
    """Return the sample standard deviation."""
    if len(self) < 2:
      return float('NaN')
    # The stupidest algorithm, but it works fine.
    arr = self.samples()
    mean = sum(arr) / len (arr)
    bigsum = 0.0
    for x in arr:
      bigsum += (x - mean)**2
    return sqrt(bigsum / (len(arr) - 1))

in Sampler (samplestats.py:54) returns False, allowing the following code to execute, with the inevitable ZeroDivisionError when it divides by len(arr) - 1.

golfdish commented 10 years ago

On closer inspection it appears that this is a consequence of startTime being larger than timestamp, as when startTime is set before time.time() gets patched out.

PeterScott commented 9 years ago

I've pushed a less-than-ideal fix for this in e1f9118b85f88a9bc5eec8f7de5fb6cf676041ef. Thanks!