Princeton-SysML / FILM

Official repo for the paper: Recovering Private Text in Federated Learning of Language Models (in NeurIPS 2022)
https://arxiv.org/abs/2205.08514
Creative Commons Zero v1.0 Universal
57 stars 7 forks source link

How to run the attack? #5

Open shanefeng123 opened 4 months ago

shanefeng123 commented 4 months ago

Hi,

I followed the instruction to try to run the attack on your provided example_data.json file. I have a few questions regarding how to run the attack properly and would appreciate your response.

  1. In the example_data.json file, it contains a batch of data of batch size 16. When I run with the following command under the beam search folder:

cargo run --release -- --datapath ../example_data.json --outputpath ../results.json

The attack can be run, but the results only have 8 sentences. Am I supposed to change some of the parameters?

  1. When I run the above command, it seems that your code is trying to recover the data in the first round of training. In your paper, you mention that your attack will be stronger given more training rounds. Is there a parameter to control when the attack should be launched at a specific training round?

  2. Is it possible to use your code on another model that is not a base GPT2 model? For example, GPT2-large, Llama 2, and so on?

  3. How can I enable GPU acceleration for the attack? I followed the instruction to set the environment variable to the cuda version, and pass "--cuda-device 0" in the command, but here is an error.

  4. How should I format the data json file if I want to pass in two different batches?

  5. Also, how do I apply the mentioned gradient pruning and DPSGD defenses in your code?

Thank you in advance, Shane

elazzouzi1080 commented 1 month ago

Hi,

I saw that you asked some detailed questions about running the attack on the provided example_data.json file, including parameters for training rounds, using different models, and enabling GPU acceleration.

I was wondering if you managed to get any results or solve those issues? I'd be really interested in learning from your experience.

Thanks in advance!