Hey, I was playing with your model and was trying to check the image editing capabilities of the XXL version. I was using this code:
import torch
from PIL import Image
from uio2.model import UnifiedIOModel
from uio2.preprocessing import UnifiedIOPreprocessor
from uio2.preprocessing import build_batch
model = UnifiedIOModel.from_pretrained("allenai/uio2-xxl-bfloat16")
model.eval()
preprocessor = UnifiedIOPreprocessor.from_pretrained("allenai/uio2-preprocessor", tokenizer="tokenizer.model")
with torch.inference_mode():
model.set_modalities(input_modalities=["image", "text"], target_modalities=["image"])
preprocessed_example = preprocessor(
text_inputs="Follow instructions in sequence to edit image: {EDIT PROMPT}.",
image_inputs="{MY_IMAGE}",
target_modality="image",
)
batch = build_batch([preprocessed_example], device=model.device)
tokens = model.generate(batch, modality="image")
img = tokens.detach().cpu().numpy().squeeze()
# save image to disk using PIL
img = Image.fromarray((img * 255).astype("uint8"))
img.save("{EDITED_IMAGE}.png")
However, the model seems adamant about returning the same image without any editions (basically just auto-encoding), and after many tries and many prompts, you can get it to change the image, but the change does not correspond to what was asked. Am I dong something wrong? Does the editing pipeline require a different configuration?
Hey, I was playing with your model and was trying to check the image editing capabilities of the XXL version. I was using this code:
However, the model seems adamant about returning the same image without any editions (basically just auto-encoding), and after many tries and many prompts, you can get it to change the image, but the change does not correspond to what was asked. Am I dong something wrong? Does the editing pipeline require a different configuration?