Code for Findings-ACL 2023 paper: Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence
We upload PC data under the data/
folder.
The ABCD dataset we experimented can be found in https://drive.google.com/file/d/1oIo8P0Y8X9DTeEfOA1WUKq8Uix9a_Pte/view?usp=sharing.
For other datasets, we use datasets
package to download and store them, so you can run our code directly.
You need to set up arguments properly before running codes:
python projection.py
projection.py
for more information.By running:
python projection.py
You will train your own baseline model and evaluate it. If you want to just train or eval a certain model, check the last four lines of projection.py
and disable the corresponding codes.
You need to set up arguments properly before running codes:
python attacker.py
attacker.py
for more information.You should train the attacker on training data at first, then test your attacker on the test data to obtain test logs. Then you can evaluate attack performance on test logs by changing model_dir to your trained attcker and data_type to test.
If you want to train a randomly initialized GPT-2 attacker, after setting the arguments, run:
python attacker_random_gpt2.py
Due to the fact that different decoders have different implementaions, we use separate py files for each model (the decoding implementations also differ).
If you want to try out opt as the attacker model, run:
python attacker_opt.py
If you want to try out t5 as the attacker model, run:
python attacker_t5.py
You need to make sure the test reuslt paths is set inside the 'eval_xxx.py' files.
To obtain classification performance, run:
python eval_classification.py
To obtain generation performance, run:
python eval_generation.py
To calculate perplexity, you need to set the LM to caluate PPL, run:
python eval_ppl.py
Please kindly cite the following paper if you found our method and resources helpful!
@inproceedings{li-etal-2023-sentence,
title = "Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence",
author = "Li, Haoran and
Xu, Mingshi and
Song, Yangqiu",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-acl.881",
doi = "10.18653/v1/2023.findings-acl.881",
pages = "14022--14040",
}
Please send any questions about the code and/or the algorithm to hlibt@connect.ust.hk