IBM / differential-privacy-library

Diffprivlib: The IBM Differential Privacy Library
https://diffprivlib.readthedocs.io
MIT License
834 stars 200 forks source link

Meaning of bounds. #58

Closed BismeetSingh closed 2 years ago

BismeetSingh commented 3 years ago

I understand what epsilon is but how do bounds affect the model?

naoise-h commented 3 years ago

In differential privacy, we add noise proportional to the spread of the data to privatise it. The bounds gives the model the information it needs to calculate the spread of the data and calibrate the noise accordingly.

BismeetSingh commented 3 years ago

How do we calculate these bounds?

naoise-h commented 3 years ago

To preserve differential privacy, this needs to be done using knowledge of the domain of the data (i.e., without looking at the data itself). If this is not possible, then the bounds can be calculated on the data itself (even though this is a violation of differential privacy). To calculate the bounds on a dataset X, you can use bounds=(np.min(X, axis=0), np.max(X, axis=0)).