MadryLab / photoguard

Raising the Cost of Malicious AI-Powered Image Editing
https://gradientscience.org/photoguard/
MIT License
563 stars 46 forks source link
adversarial-attacks adversarial-examples computer-vision deep-learning deepfakes robustness stable-diffusion

Raising the Cost of Malicious AI-Powered Image Editing

This repository contains the code for our recent work on safe-guarding images against manipulation by ML-powerd photo-editing models such as stable diffusion.

Raising the Cost of Malicious AI-Powered Image Editing
Hadi Salman*, Alaa Khaddaj*, Guillaume Leclerc*, Andrew Ilyas, Aleksander Madry
Paper: https://arxiv.org/abs/2302.06588
Blog post: https://gradientscience.org/photoguard
Interactive demo: https://huggingface.co/spaces/hadisalman/photoguard (check below for how to run it locally)

    @article{salman2023raising,
      title={Raising the Cost of Malicious AI-Powered Image Editing},
      author={Salman, Hadi and Khaddaj, Alaa and Leclerc, Guillaume and Ilyas, Andrew and Madry, Aleksander},
      journal={arXiv preprint arXiv:2302.06588},
      year={2023}
    }

Getting started

Our code relies on the stable diffusion code on Hugging Face.

  1. Clone our repo: git clone https://github.com/madrylab/photoguard.git

  2. Install dependencies:

      conda create -n photoguard python=3.10
      conda activate photoguard
      pip install -r requirements.txt
      huggingface-cli login
  3. You should now be all set! Check out our notebooks!

[New] Interactive demo

We created an interactive demo using gradio, and we are hosting it on this HuggingFace space.

image alt text

However, for faster inference, you can run the demo locally on your machine! Simply do this:

conda activate photoguard
cd demo
python app.py

Generating high-quality fake images

The first step is we will walk you through how you can generate high quality fake images. Check out this notebook! The result will be such images:

See this notebook! Open In Colab

Simple photo-guarding (Encoder Attack):

Now, we describe the simplest form of photo safeguarding that we implement. In particular, we implement a simple PGD attack on the image embedding part of the stable diffusion model. We have two demos demonstrating the efficacy of such photo safeguarding method. The goal of both is to cause the stable diffusion model to generate something that is either unrealistic, or unrelated to the original image.

Photo-guarding against Image-to-Image pipelines

The first is the case where someone uses an image + prompt to modify the input image based on the prompt description.

See this notebook! Open In Colab

Photo-guarding against Inpainting pipelines

The second is the more interesting scenario where someone wants to edit parts of an existing image via inpainting. The generated images after immunization are clearly fake!

See this notebook! Open In Colab

Complex photo-guarding (Diffusion attack)

For more effective photo-guarding especially against image inpainting, we need to attack the stable diffusion model end-to-end. Now, the generated images after immunization are even more clearly fake than above!

See this notebook!

That's it! Please let us know if you have any questions. And check our paper for details about each of these attacks.