Code example for privacy attacks with completely unknown generator

Unfortunately, this is not something that we've implemented at the moment. The main issue is that although the attacker doesn't know the generator, the auditor still needs to generate a large number of samples to estimate the attack success rate. For targeted attacks (with one or a small number of target records), this is more or less unavoidable.

If the issue is with connecting the generator to tapas (i.e. the CLI doesn't work, or the the generator is on an accessed-controlled device), you might still be able to reuse parts of the code to generate testing datasets (sampling from an auxiliary dataset + randomly adding a target).

An alternative is to develop untargeted attacks and evaluate the success rate (e.g. accuracy) for MIAs by performing the attack against a large number of different users, for a single synthetic dataset, and aggregating the results. Notably, the interpretation will be quite different. However, I am not aware of academic work on the topic.

(As an aside, there is an issue from last year with a similar idea: https://github.com/alan-turing-institute/privacy-sdg-toolbox/issues/113)

alan-turing-institute / tapas

Code example for privacy attacks with completely unknown generator #127