ParkLabML / DP-MERF

MIT License
16 stars 6 forks source link

How to run dp-gan and dp-cgan on tabular data #2

Open Hiramdu opened 2 years ago

Hiramdu commented 2 years ago

Hi, I see you have a detailed tutorial on running your method on all tabular data. But may I know how to run dp-gan and dp-cgan on tabular data mentioned in your paper? Thank you!

MijungTheGatsbyPostdoc commented 2 years ago

These are scripts we used for CGAN and GAN for tabular data.

https://github.com/ParkLabML/DP-MERF/blob/master/dpcgan/dp_cgan_reference_tab.py https://github.com/ParkLabML/DP-MERF/blob/master/dpgan/class_wise_wgan_tab.py

Mijung Park

+1 778 837 6280 https://privacy-preserving-machine-learning.github.io/people.html https://privacy-preserving-machine-learning.github.io/people.html

On Tue, Nov 30, 2021 at 6:28 AM Hiramdu @.***> wrote:

Hi, I see you have a detailed tutorial on running your method on all tabular data. But may I know how to run dp-gan and dp-cgan on tabular data mentioned in your paper? Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ParkLabML/DP-MERF/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAGONOGRO2LNCJ6VU74PSDUOTNPBANCNFSM5JB6QSLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Hiramdu commented 2 years ago

Hi Mijung, thanks for quick reply. May I ask a followup question? Below is a final screenshot when I run DP-MERF. We only need to pay attention to max ROC and max PRC, right? Also, I see it performs not good on my own data, near to 0.5 max ROC finally in private setting. Do you have any advice for parameter tuning?

Screen Shot 2021-11-29 at 6 46 22 PM
MijungTheGatsbyPostdoc commented 2 years ago

I think you should definitely choose the length scale for the Gaussian kernel wisely. If you already tried to use our median heuristic function and got 0.5 for binary classification, try to search for the length scale as a hyperparameter (meaning, set the sigma2 to be some value, then do the data generation, see the performance of the downstream classifier, and do these steps repetitively with different values). In the tabular datasets we used, the median heuristic worked well, while for the image data used in our paper we searched for the "optimal" length scale by the way I described.

Yes, final ROC and PRC are what you want to use.

Mijung Park

+1 778 837 6280 https://privacy-preserving-machine-learning.github.io/people.html https://privacy-preserving-machine-learning.github.io/people.html

On Tue, Nov 30, 2021 at 12:20 PM Hiramdu @.***> wrote:

Hi Mijung, thanks for quick reply. May I ask a followup question? Below is a final screenshot when I run DP-MERF. We only need to pay attention to max ROC and max PRC, right? Also, I see it performs not good on my own data, near to 0.5 max ROC finally in private setting. Do you have any advice for parameter tuning? [image: Screen Shot 2021-11-29 at 6 46 22 PM] https://user-images.githubusercontent.com/19141771/144121858-c6716ccc-8e8f-43b0-aa2b-81ca3035b882.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ParkLabML/DP-MERF/issues/2#issuecomment-982988397, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAGONI3BBHXQDDNDNPQJGLUOUW2HANCNFSM5JB6QSLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.