Open TeDiou opened 10 months ago
Hi @TeDiou
If you set private = True, then you enable the training with DP. And for calculate privacy budget, the code block is starting from here: https://github.com/Team-TUD/CTAB-GAN-Plus-DP/blob/6507b8a1638702ecda24e1a4dd8fddd1c40e8125/model/synthesizer/ctabgan_synthesizer.py#L581
And from this line of code:
rdp = compute_rdp(self.micro_batch_size / train_data.shape[0], self.sigma, steps, lmbds)
You can see that to calculate RDP, the batch_size, dataset size, sigma and training steps are four features influencing the privacy budget.
then in the following line:
epsilon, _, _ = get_privacy_spent(lmbds, rdp, target_delta=1e-5)
Epsilon is the privacy budget, can you add an if in the beginning of the loop to control the training only if the epsilon is less than a certain value.
Hope that solves your question.
Thanks for your answer!
Sorry to bother u, why this dp-synthesizer.sample method is different from the ctabganplus.sample 。The two models differ only in a privacy module. However, in ctabganplusdp, the generation part requires multiple loops for generation.
Hi @TeDiou Yeah, we need a loop to generate enough synthetic data, the reason is because we implemented a filter to filter out the invalid generation, so it takes more sampling than the required data number. Check this issue answer: https://github.com/Team-TUD/CTAB-GAN-Plus/issues/7#issuecomment-1576690333
I got that. Thanks a lot!_
As we set the private = True, in your source code it only calculates the privacy budget. How can we control the privacy budget? By adding a if statement?