This repository contains code and figures for our paper When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?.
Spoiler: We found transfer was hard to obtain and only succeeded very narrowly 😬
Installation | Usage | Training New VLMs | Contributing | Citation | Contact
conda update -n base -c defaults conda -y
conda create -n universal_vlm_jailbreak_env python=3.11 -y && conda activate universal_vlm_jailbreak_env
pip install --upgrade pip
conda install pytorch=2.3.0 torchvision=0.18.0 torchaudio=2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
conda install lightning=2.2.4 -c conda-forge -y
git submodule update --init --recursive
cd submodules/prismatic-vlms && pip install -e . --config-settings editable_mode=compat && cd ../..
cd submodules/DeepSeek-VL && pip install -e . --config-settings editable_mode=compat && cd ../..
Note: Adding --config-settings editable_mode=compat
is optional - it is for vscode to recognize the packages
pip install packaging ninja && pip install flash-attn==2.5.8 --no-build-isolation
conda install joblib pandas matplotlib seaborn black tiktoken sentencepiece anthropic termcolor -y
Make sure to log in to W&B by running wandb login
Login to Huggingface with huggingface-cli login
(Critical) Install the correct timm
version:
pip install timm==0.9.16
There are 4 main components to this repository:
With the currently set hyperparameters, each VLM requires its own 80GB VRAM GPU (e.g., A100, H100).
The project is built primarily on top of PyTorch, Lightning, W&B and the Prismatic suite of VLMs.
Our work was based on the Prismatic suite of VLMs by Siddharth Karamcheti and collaborators. To train additional VLMs based on new language models (e.g., Llama 3), we created a Prismatic fork. The new VLMs are publicly available on HuggingFace and include the following vision backbones:
and the following language models:
Contributions are welcome! Please format your code with black.
To cite this work, please use:
@article{schaeffer2024universaltransferableimagejailbreaks,
title={When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?},
author={Schaeffer, Rylan and Valentine, Dan and Bailey, Luke and Chua, James and Eyzaguirre, Crist{\'o}bal and Durante, Zane and Benton, Joe and Miranda, Brando and Sleight, Henry and Hughes, John and others},
journal={arXiv preprint arXiv:2407.15211},
year={2024}
}
Questions? Comments? Interested in collaborating? Open an issue or email rschaef@cs.stanford.edu or any of the other authors.