Princeton-SysML / FILM

Official repo for the paper: Recovering Private Text in Federated Learning of Language Models (in NeurIPS 2022)
https://arxiv.org/abs/2205.08514
Creative Commons Zero v1.0 Universal
57 stars 7 forks source link

Fine tune language models? #3

Closed shanefeng123 closed 1 year ago

shanefeng123 commented 1 year ago

Hi,

Thanks for providing the implementation of your excellent paper.

In your repo description, I see you said:

"We currently do not provide any models fine-tuned on the datasets (will be added to the repository at a later data).

For now, you may use pre-trained models from HuggingFace and fine-tune on the provided datasets.".

Do we need to have a language model fine tuned on the datasets to be able to perform the attack? Which part is using it? I thought you only need a language model trained with the batch to perform beam search?

Thanks, Shane

Hazelsuko07 commented 1 year ago

Hi Shane,

Thanks for raising this issue.

Our attack only operates on the gradients and the model weights, and is agnostic to the fine-tuning stage of the model. This means that you can even employ our attack on a pre-trained model directly, executing it during its first fine-tuning step. (i.e., it is not a requirement to have a language model fine-tuned on the datasets in order to carry out the attack.)

However, as we noted in our paper (Figure 4 on page 8), our attack becomes stronger towards the later stage of fine-tuning.

Let me know if you have further questions! Otherwise, please feel free to close the issue if you consider it resolved.

Best, Yangsibo

shanefeng123 commented 1 year ago

Hi Yangsibo,

Thanks of your reply. I understand the attack can work better when the model is more fine-tuned as the probability distribution is more accurate.

Can I also ask about your approach to recover the bag of words as your first step of the attack? In your paper, you mention that you follow the method by Melis, which extract the tokens that has non-zero gradients in the token embedding layer. However, in GPT2, they tie the token embedding layer to the last linear layer, resulting that all tokens have some gradients. How do you extract the bag of words here? Do you have a cutoff value of the gradient norm?

Best, Shane

listentomi commented 1 year ago

Hi Yangsibo,

Thanks of your reply. I understand the attack can work better when the model is more fine-tuned as the probability distribution is more accurate.

Can I also ask about your approach to recover the bag of words as your first step of the attack? In your paper, you mention that you follow the method by Melis, which extract the tokens that has non-zero gradients in the token embedding layer. However, in GPT2, they tie the token embedding layer to the last linear layer, resulting that all tokens have some gradients. How do you extract the bag of words here? Do you have a cutoff value of the gradient norm?

Best, Shane

do you sovle this problem? I am also confused with how to recover the bag of words for gpt2 model

SamKG commented 1 year ago

Hello,

The norms of the rows in the word embeddings for tokens in the BoW are much larger than the others.

See the demo code below for a short example

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-medium')
model = GPT2LMHeadModel.from_pretrained('gpt2-medium')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model.forward(**encoded_input, labels=encoded_input["input_ids"])
output.loss.backward()

# print norms of gradients of embeddings
torch.set_printoptions(profile="full")
print(model.get_input_embeddings().weight.grad.norm(dim=1)[encoded_input["input_ids"]])