IBM / differential-privacy-library

Diffprivlib: The IBM Differential Privacy Library
https://diffprivlib.readthedocs.io
MIT License
820 stars 196 forks source link

Computing multiple quantiles always uses a new random state #79

Closed daniel-vos closed 1 year ago

daniel-vos commented 1 year ago

When giving a list / array for quant in the diffprivlib.tools.quantile function, random_state that is given is not passed along to the calls that the function makes under the hood which means results vary between calls with the same random seed.

Reproduce

from diffprivlib.tools import quantile

import numpy as np

quantiles_1 = quantile([0, 1, 2, 3, 4], quant=[0.33, 0.66], random_state=0)
quantiles_2 = quantile([0, 1, 2, 3, 4], quant=[0.33, 0.66], random_state=0)
assert np.all(quantiles_1 == quantiles_2)

Currently this assertion will fail but of course should be the same between calls. I'm on version diffprivlib==0.6.2.

Fix Although it's easy to fix this (pass random_state=random_state along with the call at line 117 of quantile.py) the current method uses more epsilon budget than is required when computing multiple quantiles. E.g. the algorithm in 'Differentially Private Quantiles' by Jennifer Gillenwater, Matthew Joseph, Alex Kulesza appears to score significantly better than independent exponential mechanism calls. Maybe this is something to consider?

naoise-h commented 1 year ago

Hi Daniel, thank you for reporting this bug, we will work to fix this for the next release.

As regards performance, you're right, this is a naive implementation. We are currently finalising a paper that looks at an alternative method and will update you here when we can share it.