Unauthorized privacy-related and copyrighted content generation using generative-AI has becoming a significant concern for human society, raising ethical, legal, and privacy issues that demand urgent attention. The EU's General Data Protection Regulation (GDPR) include a ``right to be forgotten,'' which allows individuals to request the deletion of their personal data. However, this primarily applies to data stored in traditional databases, not AI models. Recently, machine unlearning techniques have arise that attempt to eliminate the influence of sensitive content used during AI model training, but they often require extensive updates to the deployed systems and incur substantial computational costs. In this work, we propose a novel and efficient method called Single Layer Unlearning Gradient (SLUG), that can unlearn targeted information by updating targeted layers of a model using a one-time gradient computation. Our method is highly modular and enables the selective removal of multiple sensitive concepts, such as celebrity names and copyrighted content, from the generated outputs of widely used foundation models (e.g., CLIP) and generative models (e.g., Stable Diffusion). Broadly, our method ensures AI-generated content complies with privacy regulations and intellectual property laws, fostering responsible use of generative models, mitigating legal risks and promoting a trustworthy, socially responsible AI ecosystem.
Overview of our proposed Single Layer Unlearning Gradient (SLUG) framework. Given an unlearning query, such as removing an identity like Elon Musk, we first curate or generate a forget set containing relevant data and a retain set with data points we want to preserve. Using these datasets, we calculate and store the model gradients. Based on these gradients, we identify the important layers to update for unlearning. We then take a step along the forget gradients of a single layer and evaluate the model's unlearning performance. To determine a suitable step size $\lambda$, we employ a binary search. After unlearning, the specified concepts are effectively erased while retaining the model's overall utility.
Qualitative evaluation on unlearning copyright characters Iron man and Mickey Mouse, which can potentially used for unauthorized content generation, from the Stable Diffusion (SD). Our method precisely unlearned copyright protected concepts from SD, while the image generation quality on other concepts is highly preserved.
To install requirements:
conda env create -f environment.yml
00000.tar
, 00001.tar
and so on. ILSVRC2012_img_val.tar
under data/ImageNet/
, and run bash valprep.sh
to prepare the dataset.Update data_root
in src/clip/a0_eval_celeba.py
to the absolute path of where you stored the experimental data.
The data
folder is structured as:
data
├── celeba
│ ├── img_align_celeba
│ │ ├── 010905.jpg
│ │ ├── 010906.jpg
│ │ └── ...
│ └── frequent_celebs.txt
├── ImageNet
│ └── val
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ └── ...
│ ├── n01443537
│ └── ...
├── laion
└── laion400m
├── 00000_stats.json
├── 00000.parquet
└── 00000.tar
Prepare forget and retain set. Given an unlearning task, we first curate a forget set containing relevant image-text pairs, then sample the retain set from the original training set (e.g. one shard of laion). The script for curating forget set from laion dataset is src/clip/a0_create_tar.py
Calculate forget and retain gradient.
Update the route for arguments --train-data
, --forget-data
, and --imagenet-val
in scripts/run_compute_grad.sh
, then run
bash scripts/run_compute_grad.sh
This will generate the forget gradient file stored in folder SLUG/results/grads
.
Perform the Single Layer Single Gradient update by running
bash scripts/run_clip_slug.sh
This will generate the Pareto-front plots, consine simularity matrices, and step size searching log stored at SLUG/results/clip
.
Run comparing methods
bash scripts/run_clip_comparison.sh
Create the forget set dataset file
python src/clip/a0_create_tar.py --name [celebrity name/object concept]
This will create a directory with selected images that are associated with provided celebrity name/concept from laion shard file, under data/laion/laion400m
.
And a .tar
file containing the selected images, under data/tar_files/{concept_name}.tar
.
Repeat the unlearning procedure to generate unlearning gradient using the created .tar
file, and perform unlearning.
TODO: include experiment steps for unlearning object/multiple identities
Before start, generate necessary dataset files and gradient files following steps described in Unlearning procedure.
Run Jupyter notebook notebooks/experiment_stable_diffusion.ipynb
Before start, generate necessary dataset files and gradient files following steps described in Unlearning procedure.
Run Jupyter notebook notebooks/experiment_vision_language.ipynb
First clone UnlearnCanvas repository under ./data
cd data
git clone https://github.com/OPTML-Group/UnlearnCanvas.git
Download UnlearnCanvas dataset and pretraind models following the instructions in the UnlearnCanvas repository. The UnlearnCanvas dataset folder is structured as:
data
└── UnlearnCanvas
└── data
├── Abstractionism
│ ├── Architectures
│ │ ├── 1.jpg
│ │ ├── 2.jpg
│ │ └── ...
│ ├── Bears
│ ├── Birds
│ └── ...
├── Artist_Sketch
└── ...
Generate .tar
dataset files by running:
cd src/clip
python a0_create_tar_ucanvas.py
Following gradient computing step similar to above (Unlearning procedure 2.), to generate gradient files for forget set:
cd [BACK TO SLUG/]
bash scripts/run_compute_grad_uncanvas.sh
Lastly, run UnlearnCanvas evaluation:
bash scripts/run_uncanvas.sh