It seems very hard to make the model edit a picture.

Hey, I was playing with your model and was trying to check the image editing capabilities of the XXL version. I was using this code:

import torch
from PIL import Image
from uio2.model import UnifiedIOModel
from uio2.preprocessing import UnifiedIOPreprocessor
from uio2.preprocessing import build_batch 

model = UnifiedIOModel.from_pretrained("allenai/uio2-xxl-bfloat16")
model.eval()

preprocessor = UnifiedIOPreprocessor.from_pretrained("allenai/uio2-preprocessor", tokenizer="tokenizer.model")

with torch.inference_mode():
    model.set_modalities(input_modalities=["image", "text"], target_modalities=["image"])

    preprocessed_example = preprocessor(
        text_inputs="Follow instructions in sequence to edit image: {EDIT PROMPT}.",
        image_inputs="{MY_IMAGE}",
        target_modality="image",
    )
    batch = build_batch([preprocessed_example], device=model.device)
    tokens = model.generate(batch, modality="image")
    img = tokens.detach().cpu().numpy().squeeze()
    # save image to disk using PIL
    img = Image.fromarray((img * 255).astype("uint8"))
    img.save("{EDITED_IMAGE}.png")

However, the model seems adamant about returning the same image without any editions (basically just auto-encoding), and after many tries and many prompts, you can get it to change the image, but the change does not correspond to what was asked. Am I dong something wrong? Does the editing pipeline require a different configuration?

allenai / unified-io-2.pytorch

It seems very hard to make the model edit a picture. #6