THU-BPM / MarkLLM

MarkLLM: An Open-Source Toolkit for LLM Watermarking.(EMNLP 2024 Demo)
https://aclanthology.org/2024.emnlp-demo.7/
Apache License 2.0
298 stars 32 forks source link

Bug in its_edit.py #26

Closed hwq0726 closed 2 weeks ago

hwq0726 commented 1 month ago

Hi,

I am trying to generate its_edit watermarking, but I found there might be a bug in your code in itsedit.py line 277 and 278: `, index = self.utils.phi(encoded_text,self.config.pseudo_length,self.config.sequence_length, generator, self.utils.transform_key_func, self.config.vocab_size, lambda x,y : self.utils.transform_edit_score(x,y,1), null=False, normalize=True) random_numbers = self.utils.xi[(index + np.arange(len(encoded_text))) % len(self.utils.xi)]`

First transform_edit_score is not in ITSEditUtils (but you defined it in ITSEdit()). Second, what is self.utils.xi?

Can you look at it, thanks!

panly2003 commented 3 weeks ago

Thank you for reporting this issue. There was a bug in the previous implementation of get_data_for_visualization in its_edit.py. We found ITSEdit has difficulties calculating token-level correlation values, so we've removed the get_data_for_visualization interface. The core functions generate_watermarked_text and detect_watermark remain fully functional.

Thank you again for carefully pointing out this issue! 😊