Koukyosyumei / AIJack

Security and Privacy Risk Simulator for Machine Learning (arXiv:2312.17667)
Apache License 2.0
352 stars 62 forks source link

Why does dpsgd in aijack run so fast #135

Closed weiiewwei closed 1 year ago

weiiewwei commented 1 year ago

I tried using dpsgd encapsulated in aijack to train models on different datasets, and found that its time cost is lower than not using differential privacy. Does aijack use a tool to accelerate model training, or is there a problem with my code setup. I would appreciate it very much if you could answer my question.

Koukyosyumei commented 1 year ago

@weiiewwei

Thank you for being so interested! How many trials do you experiment with? Adding noise for differential privacy does not take so much time, so you should compare the averages of multiple attempts. It might be super helpful if you could share the code or hyper-parameters.

weiiewwei commented 1 year ago

Thank you very much for your reply. I find that the training speed is so fast because I mistakenly set the parameters (lot size and iterations). I am looking for more information to learn how to set these parameters.

Koukyosyumei commented 1 year ago

As in the original paper, DPSGD updates the neural network's parameters per lot, not per batch. Thus, if you want to compare the speed of normal SGD and DPSGD, it might be better to make the batch size for SGD equal to the lot size of DPSGD. The authors of DPSGD also suggest using smaller batch sizes for better memory consumption.

I was also a little confused about the terms lot size and batch size when I read that paper first ...