alan-turing-institute / QUIPP-collab

Collaboration on the QUIPP project
1 stars 1 forks source link

Build a privacy metric #60

Closed crangelsmith closed 3 years ago

gmingas commented 4 years ago

I have been reading a bit about differential privacy (DP) as a possible privacy metric and here are my thoughts:

gmingas commented 4 years ago

Also, I found this project which was unknown to me, has anybody heard of it? https://www.turing.ac.uk/research/research-projects/evaluating-privacy-preserving-generative-models-wild

martintoreilly commented 4 years ago

Also, I found this project which was unknown to me, has anybody heard of it? https://www.turing.ac.uk/research/research-projects/evaluating-privacy-preserving-generative-models-wild

Yes, we mentioned it in our proposal. This is Adria's project. We should catch up with him soon about this (and check if it's the same as the work his postdoc is doing).

gmingas commented 4 years ago

Another possible metrics (Disclosure risk measures), look at Section 3 for description and example: http://www2.stat.duke.edu/~jerry/Papers/PSD08.pdf

ots22 commented 4 years ago

The two viable options at this point seem to be:

Plausible deniability

http://www.vldb.org/pvldb/vol10/p481-bindschaedler.pdf https://vbinds.ch/node/69 Advantages:

Disadvantages:

Minimax

https://papers.nips.cc/paper/8512-minimax-optimal-estimation-of-approximate-differential-privacy-on-neighboring-databases.pdf https://github.com/xiyangl3/adp-estimator Advantages:

Disadvantages:

gmingas commented 4 years ago

Photo from yesterday's discussion with Kasra and Oliver, summarising and grouping the possible privacy metrics we are considering at the moment.

Quick descriptions

Image from iOS

ots22 commented 4 years ago

The data-driven measures are probably more exciting, but more risky. The Method-specific ones are less risky but a bit more constrained. All would be useful/interesting in their own way.

"A" and "B1"/"B2" are what we propose to start with (working in pairs: one pair takes A and the other the Bs).

gmingas commented 4 years ago

Looking again at this discussion while writing the report. I was thinking that a possible third-way approach (in addition to the data-driven and method-specific ones) would be to exclusively use inherently differentially private synthesis methods. (GANs, VAEs, multiple imputation with embedded DP assuming we can find those and possibly the version of plausibly deniability that is equivalent to DP).

If we do this, we would not need a data-driven method; all of the methods would have the same privacy metric (epsilon), which would allow us to compare them directly. Unless I am missing something and epsilon is not equivalent across methods..

ots22 commented 3 years ago

See #123 (Privacy vs utility for differentially private methods).

In the pipeline, a privacy metric applied to these methods could just report the computed or provided privacy parameters where applicable.

ots22 commented 3 years ago

Closing this - some of the discussion above to be captured in the report (#112).