fjxmlzn / DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
http://arxiv.org/abs/1909.13403
BSD 3-Clause Clear License
296 stars 75 forks source link

Packing & Differential privacy #15

Closed myrthewouters closed 3 years ago

myrthewouters commented 3 years ago

Thanks a lot for your work!

I'm running DoppelGANger on my own use-case in which packing improves the results. Here, I use num_packing=10. I aim to do some experiments with differential privacy in the future. I was wondering, theoretically, does the packing have an influence hereon?

To explain what I mean, I suppose that when using num_packing=10, each sample each "seen" 10 times per epoch. Does this mean that we have to use a stricter noise multiplier to reach the same level of differential privacy? I do not see this in your code right now, but it could be that I misunderstand the impact of packing on differential privacy level.

fjxmlzn commented 3 years ago

Thank you for your interest in DoppelGANger. It is great to hear it worked!

You are right that packing influences the DP parameters. When packing is used, the batch size is effectively increased, and each sample appears more times on average in one epoch. compute_dp_sgd_privacy in tensorflow_privacy does not natively consider this situation, but we can change the parameters to compute_dp_sgd_privacy so that it computes the correct DP parameters for us.

We just need to change https://github.com/fjxmlzn/DoppelGANger/blob/ab2d20e48a30bcf44aab98e9ec419cd68f1195e6/gan/doppelganger.py#L927-L932 to

 compute_dp_sgd_privacy( 
     self.data_feature.shape[0], 
     self.batch_size * self.num_packing, 
     noise_multiplier, 
     self.epoch * self.num_packing, 
     self.dp_delta) 

I will commit this change to the codebase later.

Thank you for pointing it out!

myrthewouters commented 3 years ago

That makes sense. Thank you for the quick response and clarification!

myrthewouters commented 3 years ago

One minor additional question. Which version for tensorflow-privacy did you use? It says on their GitHub that the current version requires TensorFlow >= 1.14, where you use TensorFlow 1.4.0 for DoppelGANger.

Thanks again!

fjxmlzn commented 3 years ago

DoppelGANger code should work for TensorFlow 1.14 or 1.15. Just updated the readme about it. Let me know if you run into any problems.