THU-BPM / Robust_Watermark

Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.
https://arxiv.org/abs/2310.06356
24 stars 3 forks source link

About the experiments #7

Closed LeoStrange26 closed 5 months ago

LeoStrange26 commented 6 months ago

The experiments in the paper involve dipper-1 and dipper-2, corresponding to DIPPER-1: lex=60 order=0, and DIPPER-2: lex=60 order=20, respectively. Clearly, dipper-2 exhibits stronger attack capabilities compared to dipper-1. However, it is puzzling why the corresponding metrics for defending against dipper-2 in the results of the paper are even higher. Does this indicate that there is some randomness in the experimental data or are there other reasons? I would appreciate if the authors can clarify my confusion. Thx!

exlaw commented 6 months ago

This is a very good question. In our code, we use the text from the previous sentence to generate embeddings, which are then processed by the watermark model to produce watermark logits. The implementation recommended by DIPPER involves rewriting each sentence individually, and at this point, adding a bit of order diversity might not significantly affect our method.

At the same time, we acknowledge that there indeed might be factors of randomness.