brightway-lca / stats_arrays

BSD 3-Clause "New" or "Revised" License
1 stars 3 forks source link

Triangular distribution should not fail if min or max == mode #6

Open aleksandra-kim opened 6 years ago

aleksandra-kim commented 6 years ago

Original report by Pascal Lesage (Bitbucket: MPa, ).


The validation of TriangularUncertainty will raise an ImproperBoundsError if the minimum or the maximum are equal to the mode.

However, this should not be. In scipy (the underlying package for the TriangularUncertainty  class), it is ok to pass a shape parameter c = 0:

#!python
from scipy import stats
mytriang = stats.triang(c=0) # triangular distribution with mode=min=0 and max=1
mytriang.rvs(3)

returns array([0.58796552, 0.37544352, 0.13834934])

or a shape parameter c = 1:

#!python
mytriang = stats.triang(c=1) # triangular distribution with mode=max=1 and min=0
mytriang.rvs(3)

returns array([0.90142956, 0.30985289, 0.82816732])

numpy is also able to have a left or a right parameter equal to the mode parameter (not shown here, trust me).

The solution may just be to remove the equality sign in the assertion: if ((params['loc'] > params['maximum']).sum() or (params['loc'] < params['minimum']).sum()): rather than if ((params['loc'] >= params['maximum']).sum() or (params['loc'] <= params['minimum']).sum()): but the .sum() is throwing me off...

aleksandra-kim commented 6 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


If you actually use the scipy distributions, you do get an warning (though it seems like the result may still be correct):

#!python

In [2]: stats.triang(c=0)
Out[2]: <scipy.stats._distn_infrastructure.rv_frozen at 0x10f06d320>

In [3]: stats.triang(c=0).pdf(1)
/Users/cmutel/miniconda3/lib/python3.6/site-packages/scipy/stats/_continuous_distns.py:5243: RuntimeWarning: divide by zero encountered in true_divide
  return np.where(x < c, 2*x/c, 2*(1-x)/(1-c))
Out[3]: 0.0

Coudl you please check and make sure that the returned values for pdf, ppf, and cdf are still correct?

In the line ((params['loc'] >= params['maximum']).sum(), ((params['loc'] >= params['maximum']) will be an array of booleans. When we sum this array, we get the number of times this condition was true. We then test the "truthiness" of this sum (0 is false, everything else is true) to find out if there were any cases where a value was outside the defined bounds.

aleksandra-kim commented 6 years ago

Original comment by Pascal Lesage (Bitbucket: MPa, ).


I have checked both situations with mode==min (c=0) and mode==max (c=1) for extreme values (0, 1) and an intermediate value (0.5). Scipy complains often but nevertheless returns returns the correct value except in one case: the pdf(1) of the case where mode==max gives nan , should give 2.

#!python

In [1]: from scipy.stats import triang

In [2]: test_values = [0, 0.5, 1]

In [3]: triang_c0 = triang(c=0)

In [4]: for v in test_values:
   ...:     print("pdf with {}: {}".format(v, triang_c0.pdf(v)))
   ...:
C:\mypy\anaconda\lib\site-packages\scipy\stats\_continuous_distns.py:4742: RuntimeWarning: invalid value encountered in true_divide
  return np.where(x < c, 2*x/c, 2*(1-x)/(1-c))
pdf with 0: 2.0
C:\mypy\anaconda\lib\site-packages\scipy\stats\_continuous_distns.py:4742: RuntimeWarning: divide by zero encountered in true_divide
  return np.where(x < c, 2*x/c, 2*(1-x)/(1-c))
pdf with 0.5: 1.0
pdf with 1: 0.0

In [5]: for v in test_values:
   ...:     print("ppf with {}: {}".format(v, triang_c0.ppf(v)))
   ...:
ppf with 0: 0.0
ppf with 0.5: 0.2928932188134524
ppf with 1: 1.0

In [6]: for v in test_values:
   ...:     print("cdf with {}: {}".format(v, triang_c0.cdf(v)))
   ...:
cdf with 0: 0.0
C:\mypy\anaconda\lib\site-packages\scipy\stats\_continuous_distns.py:4745: RuntimeWarning: divide by zero encountered in true_divide
  return np.where(x < c, x*x/c, (x*x-2*x+c)/(c-1))
cdf with 0.5: 0.75
cdf with 1: 1.0

In [7]: triang_c1 = triang(c=1)

In [8]: for v in test_values:
   ...:     print("pdf with {}: {}".format(v, triang_c1.pdf(v)))
   ...:
C:\mypy\anaconda\lib\site-packages\scipy\stats\_continuous_distns.py:4742: RuntimeWarning: divide by zero encountered in true_divide
  return np.where(x < c, 2*x/c, 2*(1-x)/(1-c))
pdf with 0: 0.0
pdf with 0.5: 1.0
C:\mypy\anaconda\lib\site-packages\scipy\stats\_continuous_distns.py:4742: RuntimeWarning: invalid value encountered in true_divide
  return np.where(x < c, 2*x/c, 2*(1-x)/(1-c))
pdf with 1: nan

In [9]: for v in test_values:
   ...:     print("pdf with {}: {}".format(v, triang_c1.ppf(v)))
   ...:
pdf with 0: 0.0
pdf with 0.5: 0.7071067811865476
pdf with 1: 1.0

In [10]: for v in test_values:
    ...:     print("pdf with {}: {}".format(v, triang_c1.cdf(v)))
    ...:
pdf with 0: 0.0
C:\mypy\anaconda\lib\site-packages\scipy\stats\_continuous_distns.py:4745: RuntimeWarning: divide by zero encountered in true_divide
  return np.where(x < c, x*x/c, (x*x-2*x+c)/(c-1))
pdf with 0.5: 0.25
pdf with 1: 1.0
aleksandra-kim commented 6 years ago

Original comment by Pascal Lesage (Bitbucket: MPa, ).


This is actually all fixed now with the new update to SciPy (weird how we find this now!)

The code in the error reported above: return np.where(x < c, 2*x/c, 2*(1-x)/(1-c)) is no longer in SciPy and has been replaced with return np.where(q < c, np.sqrt(c * q), 1-np.sqrt((1-c) * (1-q)))

I don't know if this was there before, but they also now explicitly address the two border conditions here

So the solution would be: 1) Have the later version of update in SciPy (1.1.0) in the requirements for stats_arrays 2) Change <= and >= simple inequalities in the stats_arrays code mentioned in the original issue report.

I'll do this now.

aleksandra-kim commented 6 years ago

Original comment by Pascal Lesage (Bitbucket: MPa, ).


This fixes it: https://bitbucket.org/cmutel/stats_arrays/pull-requests/2/allow-mode-min-or-max-bounds-for/diff