AURORA (Action Reasoning Object Attribute) enables training an instruction-guided image editing model that can perform action and reasoning-centric edits, in addition to "simpler" established object, attribute or global edits. Here we release 1) training data, 2) trained model, 3) benchmark, 4) reproducible training and evaluation.
Please reach out to benno.krojer@mila.quebec or raise an issue if anything does not work!
On the data side, we release three artifacts and a Datasheet documentation:
You can also check out our Huggingface dataset.
The edit instructions are stored as data/TASK/train.json
for each of the four tasks.
For the image pairs, you can download them easily via zenodo:
wget https://zenodo.org/record/11552426/files/ag_images.zip
wget https://zenodo.org/record/11552426/files/kubric_images.zip
wget https://zenodo.org/record/11552426/files/magicbrush_images.zip
Now put them into their respective directory data/NAME
and rename them images.zip.
So in the end you should have data/kubric/images
as a directory etc.
For Something-Something-Edit, you need to go to the original source and download all the zip files and put all the videos in a folder named data/something/videos/
.
Then run
cd data/something
python extract_frames.py
python filter_keywords.py
For each sub-dataset of AURORA, an entry would look like this:
[
{
"instruction": "Leave the door while standing closer",
"input": "data/ag/images/1K0SU.mp4_4_left.png",
"output": "data/ag/images/1K0SU.mp4_4_right.png"
},
{"..."}
]
If you are interested in developing your own similar Kubric data, it takes some effort (i.e. Docker+Blender setup), but we provide some starting code under eq-kubric
.
For measuring how well models do on various editing skills (action, reasoning, object/attribute, global), we introduce AURORA-Bench hosted here on this repository under test.json
with the respective images under data/TASK/images/
.
We also release human ratings of image editing outputs on AURORA-Bench examples, which forms the basis of our main evaluation in the paper.
The output images and assocaciated human ratings (task_scores_finegrained.json
) can be downloaded from Google Drive: Link
Similar to MagicBrush we adopt the pix2pix codebase for running and training models.
Please create a python environment and install the requirements.txt file (it is unfortunately important to use 3.9 due to taming-transformers):
python3.9 -m venv env
pip3 install -r requirements.txt
You can download our trained checkpoint from Google Drive: Link, place it in the main directory and run our AURORA-trained model on an example image:
python3 edit_cli.py
To reproduce our training, first download an initial checkpoint that is the reproduced MagicBrush model: Google Drive Link
Due to weird versioning of libraries/python, you have to go to env/src/taming-transformers/taming/data/utils.py
and comment out line 11: from torch._six import string_classes
.
Now you can run the the train script (hyperparameters can be changed under configs/finetune_magicbrush_ag_something_kubric_15-15-1-1_init-magic.yaml
):
python3 main.py --gpus 0,
Specify more gpus with i.e. --gpus 0,1,2,3
.
We primarily rely on human evaluation of model outputs on AURORA-Bench. However our second proposed evaluation metric is automatic and here is how you reproduce it.
First, run python3 disc_edit.py --task TASK
(i.e. --task whatsup
). This will generate outputs in a folder called itm_evaluation, that will then be evaluated via python3 eval_disc_edit.py
We use the MIT License.
We want to thank several repositories that made our life much easier on this project: