π We welcome PRs! If you have implemented a LLM watermarking algorithm or are interested in contributing one, we'd love to include it in MarkLLM. Join our community and help make text watermarking more accessible to everyone!
pip install markllm
. We provide a user example at the end of this file.MarkLLM is an open-source toolkit developed to facilitate the research and application of watermarking technologies within large language models (LLMs). As the use of large language models (LLMs) expands, ensuring the authenticity and origin of machine-generated text becomes critical. MarkLLM simplifies the access, understanding, and assessment of watermarking technologies, making it accessible to both researchers and the broader community.
Implementation Framework: MarkLLM provides a unified and extensible platform for the implementation of various LLM watermarking algorithms. It currently supports nine specific algorithms from two prominent families, facilitating the integration and expansion of watermarking techniques.
Framework Design:
Currently Supported Algorithms:
Visualization Solutions: The toolkit includes custom visualization tools that enable clear and insightful views into how different watermarking algorithms operate under various scenarios. These visualizations help demystify the algorithms' mechanisms, making them more understandable for users.
Evaluation Module: With 12 evaluation tools that cover detectability, robustness, and impact on text quality, MarkLLM stands out in its comprehensive approach to assessing watermarking technologies. It also features customizable automated evaluation pipelines that cater to diverse needs and scenarios, enhancing the toolkit's practical utility.
Tools:
Pipelines:
Below is the directory structure of the MarkLLM project, which encapsulates its three core functionalities within the watermark/
, visualize/
, and evaluation/
directories. To facilitate user understanding and demonstrate the toolkit's ease of use, we provide a variety of test cases. The test code can be found in the test/
directory.
MarkLLM/
βββ config/ # Configuration files for various watermark algorithms
β βββ EWD.json
β βββ EXPEdit.json
β βββ EXP.json
β βββ KGW.json
β βββ ITSEdit.json
β βββ SIR.json
β βββ SWEET.json
β βββ Unigram.json
β βββ UPV.json
β βββ XSIR.json
βββ dataset/ # Datasets used in the project
β βββ c4/
β βββ human_eval/
β βββ wmt16_de_en/
βββ evaluation/ # Evaluation module of MarkLLM, including tools and pipelines
β βββ dataset.py # Script for handling dataset operations within evaluations
β βββ examples/ # Scripts for automated evaluations using pipelines
β β βββ assess_detectability.py
β β βββ assess_quality.py
β β βββ assess_robustness.py
β βββ pipelines/ # Pipelines for structured evaluation processes
β β βββ detection.py
β β βββ quality_analysis.py
β βββ tools/ # Evaluation tools
β βββ oracle.py
β βββ success_rate_calculator.py
βββ text_editor.py
β βββ text_quality_analyzer.py
βββ exceptions/ # Custom exception definitions for error handling
β βββ exceptions.py
βββ font/ # Fonts needed for visualization purposes
βββ MarkLLM_demo.ipynb # Jupyter Notebook
βββ test/ # Test cases and examples for user testing
β βββ test_method.py
β βββ test_pipeline.py
β βββ test_visualize.py
βββ utils/ # Helper classes and functions supporting various operations
β βββ openai_utils.py
β βββ transformers_config.py
β βββ utils.py
βββ visualize/ # Visualization Solutions module of MarkLLM
β βββ color_scheme.py
β βββ data_for_visualization.py
β βββ font_settings.py
β βββ legend_settings.py
β βββ page_layout_settings.py
β βββ visualizer.py
βββ watermark/ # Implementation framework for watermark algorithms
β βββ auto_watermark.py # AutoWatermark class
β βββ base.py # Base classes and functions for watermarking
β βββ ewd/
β βββ exp/
β βββ exp_edit/
β βββ kgw/
β βββ its_edit/
β βββ sir/
β βββ sweet/
β βββ unigram/
β βββ upv/
β βββ xsir/
βββ README.md # Main project documentation
βββ requirements.txt # Dependencies required for the project
Tips: If you wish to utilize the EXPEdit or ITSEdit algorithm, you will need to import for .pyx file, take EXPEdit as an example:
python watermark/exp_edit/cython_files/setup.py build_ext --inplace
.so
file into watermark/exp_edit/cython_files/
import torch
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
# Device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Transformers config
transformers_config = TransformersConfig(model=AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b').to(device),
tokenizer=AutoTokenizer.from_pretrained('facebook/opt-1.3b'),
vocab_size=50272,
device=device,
max_new_tokens=200,
min_length=230,
do_sample=True,
no_repeat_ngram_size=4)
# Load watermark algorithm
myWatermark = AutoWatermark.load('KGW',
algorithm_config='config/KGW.json',
transformers_config=transformers_config)
# Prompt
prompt = 'Good Morning.'
# Generate and detect
watermarked_text = myWatermark.generate_watermarked_text(prompt)
detect_result = myWatermark.detect_watermark(watermarked_text)
unwatermarked_text = myWatermark.generate_unwatermarked_text(prompt)
detect_result = myWatermark.detect_watermark(unwatermarked_text)
Assuming you already have a pair of watermarked_text
and unwatermarked_text
, and you wish to visualize the differences and specifically highlight the watermark within the watermarked text using a watermarking algorithm, you can utilize the visualization tools available in the visualize/
directory.
KGW Family
import torch
from visualize.font_settings import FontSettings
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from visualize.visualizer import DiscreteVisualizer
from visualize.legend_settings import DiscreteLegendSettings
from visualize.page_layout_settings import PageLayoutSettings
from visualize.color_scheme import ColorSchemeForDiscreteVisualization
# Load watermark algorithm
device = "cuda" if torch.cuda.is_available() else "cpu"
transformers_config = TransformersConfig(
model=AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b').to(device),
tokenizer=AutoTokenizer.from_pretrained('facebook/opt-1.3b'),
vocab_size=50272,
device=device,
max_new_tokens=200,
min_length=230,
do_sample=True,
no_repeat_ngram_size=4)
myWatermark = AutoWatermark.load('KGW',
algorithm_config='config/KGW.json',
transformers_config=transformers_config)
# Get data for visualization
watermarked_data = myWatermark.get_data_for_visualization(watermarked_text)
unwatermarked_data = myWatermark.get_data_for_visualization(unwatermarked_text)
# Init visualizer
visualizer = DiscreteVisualizer(color_scheme=ColorSchemeForDiscreteVisualization(),
font_settings=FontSettings(),
page_layout_settings=PageLayoutSettings(),
legend_settings=DiscreteLegendSettings())
# Visualize
watermarked_img = visualizer.visualize(data=watermarked_data,
show_text=True,
visualize_weight=True,
display_legend=True)
unwatermarked_img = visualizer.visualize(data=unwatermarked_data,
show_text=True,
visualize_weight=True,
display_legend=True)
# Save
watermarked_img.save("KGW_watermarked.png")
unwatermarked_img.save("KGW_unwatermarked.png")
Christ Family
import torch
from visualize.font_settings import FontSettings
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from visualize.visualizer import ContinuousVisualizer
from visualize.legend_settings import ContinuousLegendSettings
from visualize.page_layout_settings import PageLayoutSettings
from visualize.color_scheme import ColorSchemeForContinuousVisualization
# Load watermark algorithm
device = "cuda" if torch.cuda.is_available() else "cpu"
transformers_config = TransformersConfig(
model=AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b').to(device),
tokenizer=AutoTokenizer.from_pretrained('facebook/opt-1.3b'),
vocab_size=50272,
device=device,
max_new_tokens=200,
min_length=230,
do_sample=True,
no_repeat_ngram_size=4)
myWatermark = AutoWatermark.load('EXP',
algorithm_config='config/EXP.json',
transformers_config=transformers_config)
# Get data for visualization
watermarked_data = myWatermark.get_data_for_visualization(watermarked_text)
unwatermarked_data = myWatermark.get_data_for_visualization(unwatermarked_text)
# Init visualizer
visualizer = ContinuousVisualizer(color_scheme=ColorSchemeForContinuousVisualization(),
font_settings=FontSettings(),
page_layout_settings=PageLayoutSettings(),
legend_settings=ContinuousLegendSettings())
# Visualize
watermarked_img = visualizer.visualize(data=watermarked_data,
show_text=True,
visualize_weight=True,
display_legend=True)
unwatermarked_img = visualizer.visualize(data=unwatermarked_data,
show_text=True,
visualize_weight=True,
display_legend=True)
# Save
watermarked_img.save("EXP_watermarked.png")
unwatermarked_img.save("EXP_unwatermarked.png")
For more examples on how to use the visualization tools, please refer to the test/test_visualize.py
script in the project directory.
Using Watermark Detection Pipelines
import torch
from evaluation.dataset import C4Dataset
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from evaluation.tools.text_editor import TruncatePromptTextEditor, WordDeletion
from evaluation.tools.success_rate_calculator import DynamicThresholdSuccessRateCalculator
from evaluation.pipelines.detection import WatermarkedTextDetectionPipeline, UnWatermarkedTextDetectionPipeline, DetectionPipelineReturnType
# Load dataset
my_dataset = C4Dataset('dataset/c4/processed_c4.json')
# Device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# Transformers config
transformers_config = TransformersConfig(
model=AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b').to(device),
tokenizer=AutoTokenizer.from_pretrained('facebook/opt-1.3b'),
vocab_size=50272,
device=device,
max_new_tokens=200,
do_sample=True,
min_length=230,
no_repeat_ngram_size=4)
# Load watermark algorithm
my_watermark = AutoWatermark.load('KGW',
algorithm_config='config/KGW.json',
transformers_config=transformers_config)
# Init pipelines
pipeline1 = WatermarkedTextDetectionPipeline(
dataset=my_dataset,
text_editor_list=[TruncatePromptTextEditor(), WordDeletion(ratio=0.3)],
show_progress=True,
return_type=DetectionPipelineReturnType.SCORES)
pipeline2 = UnWatermarkedTextDetectionPipeline(dataset=my_dataset,
text_editor_list=[],
show_progress=True,
return_type=DetectionPipelineReturnType.SCORES)
# Evaluate
calculator = DynamicThresholdSuccessRateCalculator(labels=['TPR', 'F1'], rule='best')
print(calculator.calculate(pipeline1.evaluate(my_watermark), pipeline2.evaluate(my_watermark)))
Using Text Quality Analysis Pipeline
import torch
from evaluation.dataset import C4Dataset
from watermark.auto_watermark import AutoWatermark
from utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from evaluation.tools.text_editor import TruncatePromptTextEditor
from evaluation.tools.text_quality_analyzer import PPLCalculator
from evaluation.pipelines.quality_analysis import DirectTextQualityAnalysisPipeline, QualityPipelineReturnType
# Load dataset
my_dataset = C4Dataset('dataset/c4/processed_c4.json')
# Device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# Transformer config
transformers_config = TransformersConfig(
model=AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b').to(device), tokenizer=AutoTokenizer.from_pretrained('facebook/opt-1.3b'),
vocab_size=50272,
device=device,
max_new_tokens=200,
min_length=230,
do_sample=True,
no_repeat_ngram_size=4)
# Load watermark algorithm
my_watermark = AutoWatermark.load('KGW',
algorithm_config='config/KGW.json',
transformers_config=transformers_config)
# Init pipeline
quality_pipeline = DirectTextQualityAnalysisPipeline(
dataset=my_dataset,
watermarked_text_editor_list=[TruncatePromptTextEditor()],
unwatermarked_text_editor_list=[],
analyzer=PPLCalculator(
model=AutoModelForCausalLM.from_pretrained('..model/llama-7b/', device_map='auto'), tokenizer=LlamaTokenizer.from_pretrained('..model/llama-7b/'),
device=device),
unwatermarked_text_source='natural',
show_progress=True,
return_type=QualityPipelineReturnType.MEAN_SCORES)
# Evaluate
print(quality_pipeline.evaluate(my_watermark))
For more examples on how to use the pipelines, please refer to the test/test_pipeline.py
script in the project directory.
Leveraging example scripts for evaluation
In the evaluation/examples/
directory of our repository, you will find a collection of Python scripts specifically designed for systematic and automated evaluation of various algorithms. By using these examples, you can quickly and effectively gauge the d etectability, robustness and impact on text quality of each algorithm implemented within our toolkit.
Note: To execute the scripts in evaluation/examples/
, first run the following command to set the environment variables.
export PYTHONPATH="path_to_the_MarkLLM_project:$PYTHONPATH"
Additional user examples are available in test/
. To execute the scripts contained within, first run the following command to set the environment variables.
export PYTHONPATH="path_to_the_MarkLLM_project:$PYTHONPATH"
In addition to the Colab Jupyter notebook we provide (some models cannot be downloaded due to storage limits), you can also easily deploy using MarkLLM_demo.ipynb
on your local machine.
A user example:
import torch, random
import numpy as np
from markllm.watermark.auto_watermark import AutoWatermark
from markllm.utils.transformers_config import TransformersConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
# Setting random seed for reproducibility
seed = 30
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
# Device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Transformers config
model_name = 'facebook/opt-1.3b'
transformers_config = TransformersConfig(
model=AutoModelForCausalLM.from_pretrained(model_name).to(device),
tokenizer=AutoTokenizer.from_pretrained(model_name),
vocab_size=50272,
device=device,
max_new_tokens=200,
min_length=230,
do_sample=True,
no_repeat_ngram_size=4
)
# Load watermark algorithm
myWatermark = AutoWatermark.load('KGW', transformers_config=transformers_config)
# Prompt and generation
prompt = 'Good Morning.'
watermarked_text = myWatermark.generate_watermarked_text(prompt)
# How would I get started with Python...
unwatermarked_text = myWatermark.generate_unwatermarked_text(prompt)
# I am happy that you are back with ...
# Detection
detect_result_watermarked = myWatermark.detect_watermark(watermarked_text)
# {'is_watermarked': True, 'score': 9.287487590439852}
detect_result_unwatermarked = myWatermark.detect_watermark(unwatermarked_text)
# {'is_watermarked': False, 'score': -0.8443170536763502}
If you are interested in text watermarking for large language models, please read our survey: [2312.07913] A Survey of Text Watermarking in the Era of Large Language Models (arxiv.org). We detail various text watermarking algorithms, evaluation methods, applications, current challenges, and future directions in this survey.
@inproceedings{pan-etal-2024-markllm,
title = "{M}ark{LLM}: An Open-Source Toolkit for {LLM} Watermarking",
author = "Pan, Leyi and
Liu, Aiwei and
He, Zhiwei and
Gao, Zitian and
Zhao, Xuandong and
Lu, Yijian and
Zhou, Binglin and
Liu, Shuliang and
Hu, Xuming and
Wen, Lijie and
King, Irwin and
Yu, Philip S.",
editor = "Hernandez Farias, Delia Irazu and
Hope, Tom and
Li, Manling",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-demo.7",
pages = "61--71",
abstract = "Watermarking for Large Language Models (LLMs), which embeds imperceptible yet algorithmically detectable signals in model outputs to identify LLM-generated text, has become crucial in mitigating the potential misuse of LLMs. However, the abundance of LLM watermarking algorithms, their intricate mechanisms, and the complex evaluation procedures and perspectives pose challenges for researchers and the community to easily understand, implement and evaluate the latest advancements. To address these issues, we introduce MarkLLM, an open-source toolkit for LLM watermarking. MarkLLM offers a unified and extensible framework for implementing LLM watermarking algorithms, while providing user-friendly interfaces to ensure ease of access. Furthermore, it enhances understanding by supporting automatic visualization of the underlying mechanisms of these algorithms. For evaluation, MarkLLM offers a comprehensive suite of 12 tools spanning three perspectives, along with two types of automated evaluation pipelines. Through MarkLLM, we aim to support researchers while improving the comprehension and involvement of the general public in LLM watermarking technology, fostering consensus and driving further advancements in research and application. Our code is available at https://github.com/THU-BPM/MarkLLM.",
}