Open Luke2642 opened 1 year ago
Oh I never realized this repo got even tiny attention, thank you for reaching out. I'll be experimenting on this repo much often from now, so please expect more from me soon!
The problem is that i actually don't use A1111 repo that much so I have no idea what you mean. But as for the COLAB notebook, ill make one soon as I can!
Fantastic, that’s great news, and thanks so much for the reply!
I’m still learning the basics, but I'm excited, I think it'll be so useful!
VQ-VAE reconstruction is relatively trivial and not that much use, but it could be good for image compression as that article I linked before.
Clip embedding is key to both image variations and image mixing but those both require a fine tuned model to generate from a clip embedding. It'd be great if future diffusion models were all trained this way, but that's not something we can influence much.
As I understand it, the precision of a textual inversion is limited by the second forward pass of classifier free guidance. And the more vectors are used the lower the editability - more vectors seems to only constrain the diffusion process more strongly to “paint” the same image across all seeds, rather than creating a semantic description like image variations / mixing.
Null text inversion does seem like a really interesting approach, solving for one seed, and possibly semantic enough to be useful across many seeds and editable. But it’ll need implementing in the major UIs to be creatively useful outside developers circles!
Another interesting approach posted recently - closer to the “holy grail” idea of being able to turn any image into a meaningful description in human language, that can then be turned back into the image:
https://arxiv.org/abs/2302.03668 https://github.com/YuxinWenRick/hard-prompts-made-easy https://huggingface.co/spaces/tomg-group-umd/pez-dispenser https://colab.research.google.com/drive/1VSFps4siwASXDwhK_o29dKA9COvTnG8A?usp=sharing
Anyway, long ramble, keep up the good work!
Thanks for this repo, it's much eaiser to follow than the original google null text inversion in prompt to prompt! I haven't quite got yours working yet though.
For requirements on colab free: I've been using old verisons to get it to work with the original. This combo works on colab free:
!pip install -U --pre triton torchinfo xformers==0.0.16rc425 diffusers==0.7.2 transformers==4.22.2 accelerate==0.12.0 ftfy
And this combo works too:
!pip install --quiet diffusers==0.8.0 !pip install --quiet https://github.com/brian6091/xformers-wheels/releases/download/0.0.15.dev0%2B4c06c79/xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl !pip install --quiet --upgrade transformers scipy mediapy accelerate ftfy spacy einops