Closed harubaru closed 1 year ago
Hey Haru. Great work on the variation generation!
There seems to be an issue with using variable size images because of the clip encoding step. You can fix that by just copying the clip preprocessing done usually. So a loop would look like this
for name, file_info in file_input.value.items():
image_pil = Image.open(io.BytesIO(file_info['content']))
min_side = min(image_pil.size)
transforms = T.Compose([
T.ToTensor(),
T.CenterCrop(min_side),
T.Resize(clip.image_size)
])
image_tensor = transforms(image_pil).unsqueeze_(0).to(device)
unbatched_image_embed, _ = clip.embed_image(image_tensor)
image_embed = torch.zeros(text_rep, unbatched_image_embed.shape[-1])
Also, the way the image grouping works, the number of images must match the number of prompts so I would ask you to put something like this in to enforce that
num_images = len(file_input.value.items())
num_prompts = len(prompts)
assert num_images == num_prompts, "Each uploaded image must have an associated prompt"
Finally, could you make sure to clear output before committing? I forgot to do that the first time and it makes diffs go crazy so it is difficult to see what has changed.
Noted! Thanks for the feedback, I'm going to see if I can get to that soon.
This adds support for the notebook to generate variations of images based on image input. It's a little bit hacky right now as I am unfamiliar with using Colab and Notebooks in general, but this is a start! 😅