defgsus / clipig

OpenAI CLIP based image generator with complex config file controlled transformation and training pipelines
https://defgsus.github.io/clipig/
MIT License
18 stars 1 forks source link

Robust CLIP #3

Open realfolkcode opened 2 weeks ago

realfolkcode commented 2 weeks ago

Hi @defgsus! I remember stumbling upon this repo and it blowing my mind! Recently I discovered Robust CLIP, and I immediately thought about revisiting the idea of CLIP image generation. Here are some of my results. It would be great to hear some fresh ideas from you, as you inspired me to start this project! Thanks!

defgsus commented 2 weeks ago

Hi @realfolkcode! Thanks for the hint. I will certainly check out the details of what you did and used.

Recently i started a new CLIP-based image generator, something more like a drawing app. Some long term goal is to have a prompt-guided "magic brush". Let's say you have an image of a stone wall (painted, photographed, rendered, ...) and you want to add some moss here and some medieval ivy there. So you create a new pen, enter a prompt and keep painting on top of the image until there's enough moss and ivy at the desired places. It's currently called 'CLIPig 2' and part of my nn-experiments repo.

Don't have the "magic brush" yet but i experimented with mixing several image layers and especially create some nice old-school game graphics from simple sketches.

E.g. here is a sketch: waterfall-06-raw

Then running CLIPig on top with some prompt like "beautiful waterfall, pixelart, ...": waterfall-06-c

And the really new thing i tried was denoising the result with a small self-trained CNN: waterfall-06-c-denoised-strong

And i did some video experiments, by zooming in or rotating the image in between the CLIPig updates:

https://github.com/defgsus/clipig/assets/6267997/159ec17d-f52b-4835-a7b4-afd74f25346c

realfolkcode commented 2 weeks ago

This is a really neat idea! More interaction is definitely a way to go! I can actually see it being useful for adding some textures (besides concrete objects), though it might be tricky for CLIP and humans to convey the abstraction behind textures into text. The fractal video is trippy and the feel of depth is actually impressive.

P.S. Just some random idea I will probably never do but nevertheless fun to talk about: CLIP critic. Two people (or more) compete on a website with integrated painting tools to impress a CLIP-based critic. It first generates a random text prompt, then the players should sketch an image that satisfies the prompt under some constrained time (e.g., one minute). The one with a higher CLIP score wins.