anudeex / WARDEN

Code of our paper - "WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection" (Accepted to ACL2024 (Main Proceedings)).
5 stars 0 forks source link

WARDEN

Code of our paper - "WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection" (In Proceedings of ACL 2024).

Paper: link

image

Abstract

Embedding as a Service (EaaS) has become a widely adopted solution, which offers feature extraction capabilities for addressing various downstream tasks in Natural Language Processing (NLP). Prior studies have shown that EaaS can be prone to model extraction attacks; nevertheless, this concern could be mitigated by adding backdoor watermarks to the text embeddings and subsequently verifying the attack models post-publication. Through the analysis of the recent watermarking strategy for EaaS, EmbMarker, we design a novel CSE (Clustering, Selection, Elimination) attack that removes the backdoor watermark while maintaining the high utility of embeddings, indicating that the previous watermarking approach can be breached. In response to this new threat, we propose a new protocol to make the removal of watermarks more challenging by incorporating multiple possible watermark directions. Our defense approach, WARDEN, notably increases the stealthiness of watermarks and empirically has been shown effective against CSE attack.

Getting Started

We re-use released datasets, queried GPT embeddings, and word counting files by EmbMarker. You can download the embddings and MIND news files via our script based on gdown.

pip install gdown
bash preparation/download.sh

Or manually download the files with the following guideline.

Preparing dataset

We directly use the SST2, Enron Spam and AG News published on huggingface datasets.

Requesting GPT3 Embeddings

We release the pre-requested embeddings. You can click the link to download them one by one into data directory. dataset split download link
SST2 train link
SST2 validation link
SST2 test link
Enron Spam train link
Enron Spam test link
Ag News train link
Ag News test link
MIND all link

Counting word frequency

The pre-computed word count file is here. You can also preprocess wikitext dataset to get the same file.

cd preparation
python word_count.py

Our code is based on the work of EmbMarker

Citing

@inproceedings{shetty-etal-2024-warden,
    title = "{WARDEN}: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection",
    author = "Shetty, Anudeex  and
      Teng, Yue  and
      He, Ke  and
      Xu, Qiongkai",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.725",
    pages = "13430--13444",
}