THU-BPM / Robust_Watermark

Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.
https://arxiv.org/abs/2310.06356
24 stars 3 forks source link

Trained the Watermark Model #8

Open kirudang opened 3 months ago

kirudang commented 3 months ago

Hello,

I am trying to produce your watermarked texts for my experiments: Set a prompt of first 200 tokens and let Llama 2 with your SIR watermark generate max new 200 tokens. I have a few questions regarding this:

  1. Can I use your trained watermark model uploaded, which is transform_model_cbert.pth because I will use C4 dataset as well.
  2. If I must train the WM model, the input_dim from train_watermark_model.py is one parameter of Transform Model? python train_watermark_model.py --input_path data/embeddings/train_embeddings.txt --output_model model/transform_model_cbert.pth --input_dim 1024
  3. I am generate watermark text from watermark_and_detect.py, What are chunk_size and delta here? How should i choose them for my use case?

` parser.add_argument('--generate_number', type=int, default=2)

parser.add_argument('--delta', type=float, default=1)
parser.add_argument('--chunk_size', type=int, default=10)
parser.add_argument('--max_new_tokens', type=int, default=200)
parser.add_argument('--data_path', type=str, default='AI_Lab/c4_prompt_test.pt')
parser.add_argument('--output_path', type=str, default="SIR_llama2_beam_5.json")
parser.add_argument('--transform_model', type=str, default="./trained_model/transform_model_cbert.pth")
parser.add_argument('--embedding_model', type=str, default="perceptiveshawty/compositional-bert-large-uncased")
parser.add_argument('--decode_method', type=str, default="beam")
parser.add_argument('--prompt_size', type=int, default=200) # estimated number of tokens in the prompt 200
parser.add_argument('--beam_size', type=int, default=5)`
  1. Detection threshold. In KGW watermark, I believe they choose z = 4, and run detection function over the text to check if the generated text is watermarked or not. This setting also helps in the attack. For example, after paraphrasing attack, we can evaluate the z score in paraphased text and compare with z = 4 to see if the modified text is still recognized as watermarked or not.

In our SIR, it is mentioned "Without a watermark, the expected score is 0 since the watermark logit mean is 0. When a watermark is present, the score substantially exceeds 0". So should I use 0 as detection threshold? Or do you have any suggestion? Some of my output's z: "z_score_generated": 0.48147795266575283, "z_score_generated": 0.385262405798759

exlaw commented 3 months ago

Thank you for your interest in our work.

  1. Yes, you can use the watermark model I uploaded, as long as you use cbert to generate the embeddings.
  2. input_dim refers to the output dimension of cbert. For the cbert-large model, it is 1024, and for the base model, it is 768.
  3. chunk_size refers to how many previous tokens are used to produce the embedding, typically set to 7 or 10.
  4. delta is similar to delta in KGW, and we typically set it to half of the corresponding value in KGW, so in our experiments, it is set to 1.
  5. Regarding the detection threshold, we did not use a fixed threshold in our paper. Instead, we put the detection scores of the LLM-generated text and human text together and dynamically find an optimal threshold for classification. This value is usually greater than 0, typically ranging from 0.2 to 0.4.

We also welcome you to use the relevant code from markllm (https://github.com/THU-BPM/MarkLLM) for testing and experiments. It integrates the SIR algorithm, which is also our team's work, making it more convenient to compare with other algorithms.

kirudang commented 3 months ago

Hi there,

Thank you for your prompt response. They definitely resolve my concerns. For the Tool kits, I will take a look at it soon.

Many thanks for your help.