alan-turing-institute / QUIPP-collab

Collaboration on the QUIPP project
1 stars 1 forks source link

EPFL/UCL paper #130

Open gmingas opened 3 years ago

gmingas commented 3 years ago

Interesting paper from EPFL/UCL just published which describes a privacy metric applicable to dataset synthesis and makes comparisons using various synthetic methods and datasets (including CTGAN). Can be found here.

Key takaway: Our evaluation framework enabled us to study the privacygain provided by a wide variety of generative models for different datasets and adversarial settings. Our results chal-lenge the claim that synthetic data provides a silver-bullet solution to the privacy problem of microdata publishing. Our experiments surface two fundamental reasons why generative models are unsuitable privacy mechanisms. First, it is not possible to predict what data characteristics will be preserved in a model’s stochastic output. Thus, the more complex the model, the harder it is to know in advance, or even bound, the level of protection it will provide for a given target record. Furthermore, as the model selectively amplifies some signals, synthetic data provides differential protection for target records. Second, the utility of generative models comes from their ability to extract patterns and replicate these in synthetic datasets. As a result, synthetic data that is useful for analysis, by definition, also contains enough information to mount inference attacks. Likely for the same reasons, differential privacy-based defenses fail to increase privacy gain. The perturbations required to achieve differential privacy make it even harder to predict which records will remain vulnerable, and might even increase the exposure of some data records. Besides, existing techniques provide protection for a select set of data features only, leaving the synthetic data open to inference attacks that leverage other preserved characteristics.

gmingas commented 3 years ago

Added to Zotero

gmingas commented 3 years ago

GitHub repo: https://github.com/spring-epfl/synthetic_data_release