This repository contains the code for our recent work on safe-guarding images against manipulation by ML-powerd photo-editing models such as stable diffusion.
Raising the Cost of Malicious AI-Powered Image Editing
Hadi Salman*, Alaa Khaddaj*, Guillaume Leclerc*, Andrew Ilyas, Aleksander Madry
Paper: https://arxiv.org/abs/2302.06588
Blog post: https://gradientscience.org/photoguard
Interactive demo: https://huggingface.co/spaces/hadisalman/photoguard (check below for how to run it locally)
@article{salman2023raising,
title={Raising the Cost of Malicious AI-Powered Image Editing},
author={Salman, Hadi and Khaddaj, Alaa and Leclerc, Guillaume and Ilyas, Andrew and Madry, Aleksander},
journal={arXiv preprint arXiv:2302.06588},
year={2023}
}
Our code relies on the stable diffusion code on Hugging Face.
Clone our repo: git clone https://github.com/madrylab/photoguard.git
Install dependencies:
conda create -n photoguard python=3.10
conda activate photoguard
pip install -r requirements.txt
huggingface-cli login
You should now be all set! Check out our notebooks!
We created an interactive demo using gradio, and we are hosting it on this HuggingFace space.
However, for faster inference, you can run the demo locally on your machine! Simply do this:
conda activate photoguard
cd demo
python app.py
The first step is we will walk you through how you can generate high quality fake images. Check out this notebook! The result will be such images:
See this notebook!
Now, we describe the simplest form of photo safeguarding that we implement. In particular, we implement a simple PGD attack on the image embedding part of the stable diffusion model. We have two demos demonstrating the efficacy of such photo safeguarding method. The goal of both is to cause the stable diffusion model to generate something that is either unrealistic, or unrelated to the original image.
The first is the case where someone uses an image + prompt to modify the input image based on the prompt description.
See this notebook!
The second is the more interesting scenario where someone wants to edit parts of an existing image via inpainting. The generated images after immunization are clearly fake!
See this notebook!
For more effective photo-guarding especially against image inpainting, we need to attack the stable diffusion model end-to-end. Now, the generated images after immunization are even more clearly fake than above!
See this notebook!
That's it! Please let us know if you have any questions. And check our paper for details about each of these attacks.