Open Hiramdu opened 3 years ago
These are scripts we used for CGAN and GAN for tabular data.
https://github.com/ParkLabML/DP-MERF/blob/master/dpcgan/dp_cgan_reference_tab.py https://github.com/ParkLabML/DP-MERF/blob/master/dpgan/class_wise_wgan_tab.py
Mijung Park
+1 778 837 6280 https://privacy-preserving-machine-learning.github.io/people.html https://privacy-preserving-machine-learning.github.io/people.html
On Tue, Nov 30, 2021 at 6:28 AM Hiramdu @.***> wrote:
Hi, I see you have a detailed tutorial on running your method on all tabular data. But may I know how to run dp-gan and dp-cgan on tabular data mentioned in your paper? Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ParkLabML/DP-MERF/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAGONOGRO2LNCJ6VU74PSDUOTNPBANCNFSM5JB6QSLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi Mijung, thanks for quick reply. May I ask a followup question? Below is a final screenshot when I run DP-MERF. We only need to pay attention to max ROC and max PRC, right? Also, I see it performs not good on my own data, near to 0.5 max ROC finally in private setting. Do you have any advice for parameter tuning?
I think you should definitely choose the length scale for the Gaussian kernel wisely. If you already tried to use our median heuristic function and got 0.5 for binary classification, try to search for the length scale as a hyperparameter (meaning, set the sigma2 to be some value, then do the data generation, see the performance of the downstream classifier, and do these steps repetitively with different values). In the tabular datasets we used, the median heuristic worked well, while for the image data used in our paper we searched for the "optimal" length scale by the way I described.
Yes, final ROC and PRC are what you want to use.
Mijung Park
+1 778 837 6280 https://privacy-preserving-machine-learning.github.io/people.html https://privacy-preserving-machine-learning.github.io/people.html
On Tue, Nov 30, 2021 at 12:20 PM Hiramdu @.***> wrote:
Hi Mijung, thanks for quick reply. May I ask a followup question? Below is a final screenshot when I run DP-MERF. We only need to pay attention to max ROC and max PRC, right? Also, I see it performs not good on my own data, near to 0.5 max ROC finally in private setting. Do you have any advice for parameter tuning? [image: Screen Shot 2021-11-29 at 6 46 22 PM] https://user-images.githubusercontent.com/19141771/144121858-c6716ccc-8e8f-43b0-aa2b-81ca3035b882.png
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ParkLabML/DP-MERF/issues/2#issuecomment-982988397, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAGONI3BBHXQDDNDNPQJGLUOUW2HANCNFSM5JB6QSLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi, I see you have a detailed tutorial on running your method on all tabular data. But may I know how to run dp-gan and dp-cgan on tabular data mentioned in your paper? Thank you!