IBM / differential-privacy-library

Diffprivlib: The IBM Differential Privacy Library
https://diffprivlib.readthedocs.io
MIT License
820 stars 196 forks source link

[ENH] Add random state support to diffprivlib #72

Closed naoise-h closed 1 year ago

naoise-h commented 2 years ago

This PR adds the random_state parameter to all diffprivlib methods (mechanisms, models and tools) to allow for the seeding of the random number generator (RNG). This fixes #31

Description

Random State

All functions in diffprivlib that implement differential privacy (i.e., mechanisms, models and tools) now have a random_state parameter, to which an RNG seed can be passed, or a RandomState instance itself. Diffprivlib largely follows the same standard as scikit-learn , in that deterministic behaviour of a function requires an integer seed (i.e., random_state=42). For deterministic behaviour across a script, a RandomState instance should be seeded (i.e., rng = np.random.RandomState(42) or rng = dp.utils.check_random_state(42)), and passed to all relevant functions.

>>> import diffprivlib as dp
>>> dp.mechanisms.Laplace(epsilon=1, sensitivity=1, random_state=42).randomise(0)
-0.8652695764638703
>>> dp.mechanisms.Laplace(epsilon=1, sensitivity=1, random_state=42).randomise(0)
-0.8652695764638703

>>> rng = dp.utils.check_random_state(42)
>>> dp.mechanisms.Laplace(epsilon=1, sensitivity=1, random_state=rng).randomise(0)
-0.8652695764638703
>>> dp.mechanisms.Laplace(epsilon=1, sensitivity=1, random_state=rng).randomise(0)
0.09503204532300952

If no random state is passed (i.e., random_state=None), diffprivlib tries to use a cryptographically secure pseudorandom number generator (CSPRNG) in its mechanisms for generating noise to satisfy differential privacy, using the secrets library. This applies even when random_state=None is passed to a function in the models and tools modules. At the time of this pull request, a CSPRNG is used in all mechanisms except for the Bingham and Staircase mechanisms.

>>> dp.mechanisms.Laplace(epsilon=1, sensitivity=1, random_state=0)._rng
RandomState(MT19937) at 0x7F010E606A40
>>> dp.mechanisms.Laplace(epsilon=1, sensitivity=1, random_state=None)._rng
<random.SystemRandom object at 0x562be655d190>

Logistic Regression

Separately, this PR also correctly implements the scikit-learn loss module for the logistic regression classifier. This first appeared in sklearn 1.1.0, and diffprivlib includes backwards compatibility for older versions of scikit-learn that are still supported.

codecov[bot] commented 2 years ago

Codecov Report

Base: 99.49% // Head: 99.38% // Decreases project coverage by -0.11% :warning:

Coverage data is based on head (ffd1b92) compared to base (b3d6a72). Patch coverage: 98.21% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #72 +/- ## ========================================== - Coverage 99.49% 99.38% -0.12% ========================================== Files 34 34 Lines 2594 2617 +23 ========================================== + Hits 2581 2601 +20 - Misses 13 16 +3 ``` | [Impacted Files](https://codecov.io/gh/IBM/differential-privacy-library/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM) | Coverage Δ | | |---|---|---| | [diffprivlib/models/logistic\_regression.py](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-ZGlmZnByaXZsaWIvbW9kZWxzL2xvZ2lzdGljX3JlZ3Jlc3Npb24ucHk=) | `95.76% <76.47%> (-2.87%)` | :arrow_down: | | [diffprivlib/\_\_init\_\_.py](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-ZGlmZnByaXZsaWIvX19pbml0X18ucHk=) | `100.00% <100.00%> (ø)` | | | [diffprivlib/mechanisms/base.py](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-ZGlmZnByaXZsaWIvbWVjaGFuaXNtcy9iYXNlLnB5) | `100.00% <100.00%> (ø)` | | | [diffprivlib/mechanisms/binary.py](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-ZGlmZnByaXZsaWIvbWVjaGFuaXNtcy9iaW5hcnkucHk=) | `100.00% <100.00%> (ø)` | | | [diffprivlib/mechanisms/bingham.py](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-ZGlmZnByaXZsaWIvbWVjaGFuaXNtcy9iaW5naGFtLnB5) | `100.00% <100.00%> (ø)` | | | [diffprivlib/mechanisms/exponential.py](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-ZGlmZnByaXZsaWIvbWVjaGFuaXNtcy9leHBvbmVudGlhbC5weQ==) | `98.16% <100.00%> (ø)` | | | [diffprivlib/mechanisms/gaussian.py](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-ZGlmZnByaXZsaWIvbWVjaGFuaXNtcy9nYXVzc2lhbi5weQ==) | `98.91% <100.00%> (+0.01%)` | :arrow_up: | | [diffprivlib/mechanisms/geometric.py](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-ZGlmZnByaXZsaWIvbWVjaGFuaXNtcy9nZW9tZXRyaWMucHk=) | `100.00% <100.00%> (ø)` | | | [diffprivlib/mechanisms/laplace.py](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-ZGlmZnByaXZsaWIvbWVjaGFuaXNtcy9sYXBsYWNlLnB5) | `100.00% <100.00%> (ø)` | | | [diffprivlib/mechanisms/snapping.py](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-ZGlmZnByaXZsaWIvbWVjaGFuaXNtcy9zbmFwcGluZy5weQ==) | `100.00% <100.00%> (ø)` | | | ... and [15 more](https://codecov.io/gh/IBM/differential-privacy-library/pull/72/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

lgtm-com[bot] commented 2 years ago

This pull request introduces 1 alert when merging a0ddbc2c9a6609726bfbe779e412d617f593d2cf into b3d6a722e1eae9f9576690e42743afb2bc247db9 - view on LGTM.com

new alerts: