IBM / differential-privacy-library

Diffprivlib: The IBM Differential Privacy Library
https://diffprivlib.readthedocs.io
MIT License
820 stars 196 forks source link

nanmean is broken #87

Closed marty90 closed 1 year ago

marty90 commented 1 year ago

Describe the bug

The nan* functions do not work due to a bug in the bound calculation At this line, bounds are erroneously set to nan. Thus, at this line, the condition becomes False, as np.allclose returns False by default with nan. This makes this exception to trigger erroneously.

To fix: replace np.min and np.max with np.nanmin and np.nanmax

To Reproduce

Just run the code:

diffprivlib.tools.nanmean(np.array([0,1,2,3,4, np.nan]))

It raises:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/diffprivlib/tools/utils.py", line 267, in nanmean
    return _mean(array, epsilon=epsilon, bounds=bounds, axis=axis, dtype=dtype, keepdims=keepdims,
  File "/usr/local/lib/python3.8/dist-packages/diffprivlib/tools/utils.py", line 290, in _mean
    array = clip_to_bounds(np.ravel(array), bounds)
  File "/usr/local/lib/python3.8/dist-packages/diffprivlib/validation.py", line 195, in clip_to_bounds
    raise ValueError(f"For non-scalar bounds, input array must be 2-dimensional. Got {array.ndim} dimensions.")
ValueError: For non-scalar bounds, input array must be 2-dimensional. Got 1 dimensions.
naoise-h commented 1 year ago

Thank you for the bug report Martino. This is indeed a bug. We will push a fix for this in due course. Many thanks for finding and reporting it.