Disclaimer. This repo contains examples of harmful language. Reader discretion is recommended.
This is the official repository for Voice Jailbreak Attacks Against GPT-4o. In this paper, we present the first study on how to jailbreak GPT-4o with voice.
Check out our demo below!
echo "export OPENAI_API_KEY='YOURKEY'" >> ~/.zshrc
source ~/.zshrc
echo $OPENAI_API_KEY # check your key
python tts/prompt2audio.py --dataset baseline --voice fable
python tts/prompt2audio.py --dataset textjailbreak --voice fable
Then, manually play each audio on GPT-4o to test its performance.
Forbidden Questions
data/question_set/questions_tiny.csv
data/question_set/questions_tiny_zh.csv
Prompts
data/jailbreak_prompts/text_jailbreak_prompts.csv
data/jailbreak_prompts/voicejailbreak.csv
data/question_set/questions_tiny_plot.csv
Success Cases
data/screenshot/
We take utmost care of the ethics of our study. Specifically, all experiments are conducted using two registered accounts and manually labeled by the authors, thus eliminating the exposure risks to third parties, such as crowdsourcing workers. Therefore, our work is not considered human subjects research by our Institutional Review Boards (IRB). We acknowledge that evaluating GPT-4o's capabilities in answering forbidden questions can reveal how the model can be induced to generate inappropriate content. This can raise concerns about potential misuse. We believe it is important to disclose this research fully. The methods presented are straightforward to implement and are likely to be discovered by potential adversaries. We have responsibly disclosed our findings to related LLM vendors.
If you find this useful in your research, please consider citing:
@article{SWBZ24,
author = {Xinyue Shen and Yixin Wu and Michael Backes and Yang Zhang},
title = {{Voice Jailbreak Attacks Against GPT-4o}},
journal = {{CoRR abs/2405.19103}},
year = {2024}
}
VoiceJailbreak
is licensed under the terms of the MIT license. See LICENSE for more details.