AURORA: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation

AURORA (Action Reasoning Object Attribute) enables training an instruction-guided image editing model that can perform action and reasoning-centric edits, in addition to "simpler" established object, attribute or global edits. Here we release 1) training data, 2) trained model, 3) benchmark, 4) reproducible training and evaluation.

Overview

Please reach out to benno.krojer@mila.quebec or raise an issue if anything does not work!

TODOs

[x] Training dataset access
[x] Benchmark access
[x] Human ratings
[x] Push code for inference & training
[x] Acknowledgements
[x] Push code for reproducing evaluation
[x] Create a demo of our model
[x] Huggingface ecosystem
[ ] Kubric simulation code

Data

On the data side, we release three artifacts and a Datasheet documentation:

The training dataset (AURORA)
A benchmark for testing diverse editing skills (AURORA-Bench): object-centric, action-centric, reasoning-centric, and global edits
Human ratings on AURORA-Bench, i.e. for other researchers working image editing metrics

You can also check out our Huggingface dataset.

Training Data (AURORA)

The edit instructions are stored as data/TASK/train.json for each of the four tasks.

For the image pairs, you can download them easily via zenodo:

wget https://zenodo.org/record/11552426/files/ag_images.zip
wget https://zenodo.org/record/11552426/files/kubric_images.zip
wget https://zenodo.org/record/11552426/files/magicbrush_images.zip

Now put them into their respective directory data/NAME and rename them images.zip. So in the end you should have data/kubric/images as a directory etc.

For Something-Something-Edit, you need to go to the original source and download all the zip files and put all the videos in a folder named data/something/videos/.

Then run

cd data/something
python extract_frames.py
python filter_keywords.py

For each sub-dataset of AURORA, an entry would look like this:

[
  {
    "instruction": "Leave the door while standing closer",
    "input": "data/ag/images/1K0SU.mp4_4_left.png",
    "output": "data/ag/images/1K0SU.mp4_4_right.png"
  },
  {"..."}
]

If you are interested in developing your own similar Kubric data, it takes some effort (i.e. Docker+Blender setup), but we provide some starting code under eq-kubric.

Benchmark: AURORA-Bench

For measuring how well models do on various editing skills (action, reasoning, object/attribute, global), we introduce AURORA-Bench hosted here on this repository under test.json with the respective images under data/TASK/images/.

Human Ratings

We also release human ratings of image editing outputs on AURORA-Bench examples, which forms the basis of our main evaluation in the paper. The output images and assocaciated human ratings (task_scores_finegrained.json) can be downloaded from Google Drive: Link

Running stuff

Similar to MagicBrush we adopt the pix2pix codebase for running and training models.

Inference

Please create a python environment and install the requirements.txt file (it is unfortunately important to use 3.9 due to taming-transformers):

python3.9 -m venv env
pip3 install -r requirements.txt

You can download our trained checkpoint from Google Drive: Link, place it in the main directory and run our AURORA-trained model on an example image:

python3 edit_cli.py

Training

To reproduce our training, first download an initial checkpoint that is the reproduced MagicBrush model: Google Drive Link

Due to weird versioning of libraries/python, you have to go to env/src/taming-transformers/taming/data/utils.py and comment out line 11: from torch._six import string_classes.

Now you can run the the train script (hyperparameters can be changed under configs/finetune_magicbrush_ag_something_kubric_15-15-1-1_init-magic.yaml):

python3 main.py --gpus 0,

Specify more gpus with i.e. --gpus 0,1,2,3.

Reproduce Evaluation

We primarily rely on human evaluation of model outputs on AURORA-Bench. However our second proposed evaluation metric is automatic and here is how you reproduce it.

First, run python3 disc_edit.py --task TASK (i.e. --task whatsup). This will generate outputs in a folder called itm_evaluation, that will then be evaluated via python3 eval_disc_edit.py

Acknowledgements, License & Citation

We use the MIT License.

We want to thank several repositories that made our life much easier on this project:

The MagicBrush and InstructPix2Pix code base and datasets, especially the correspondance with MagicBrush authors helped us a lot.
The dataset/engines we use to build AURORA: Something Something v2, Action-Genome and Kubric
Source code from EQBEN for generating images with the Kubric engine

McGill-NLP / AURORA

readme