SHI-Labs / Versatile-Diffusion

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023
https://arxiv.org/abs/2211.08332
MIT License
1.32k stars 84 forks source link

Release VD-Basic #3

Closed litevex closed 1 year ago

litevex commented 1 year ago

For people only interested in the variation part, it would be good to have the VD-Basic checkpoint public.

demirklvc commented 1 year ago

exactly. just want to play with the variation tool-it makes much fun.

xingqian2018 commented 1 year ago

We will release the basic model in later updates. Please stay tuned.

TabuaTambalam commented 1 year ago

Image variation part only is a 3gb+ unet just as justinpinkney's (I made it into JITs before: https://huggingface.co/Larvik/sd470k_imgemb/tree/main) Load a 12gb weight file for that is not necessary.

Also the cond image will be resized to 224x224 anyway, make that a [1,3,512,512] thing is also not necessary.

A short script to make image cond:

import os
import torch
from PIL import Image
from torchvision import transforms
def load_im(im_path):
    if im_path.startswith("http"):
        response = requests.get(im_path)
        response.raise_for_status()
        im = Image.open(BytesIO(response.content))
    else:
        im = Image.open(im_path).convert("RGB")
    tforms = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop((224, 224)),
        transforms.ToTensor(),
    ])
    inp = tforms(im).unsqueeze(0)
    return inp*2-1
if not os.path.isfile('imgemb.pt'):
  !wget https://huggingface.co/Larvik/imgemb_t1/resolve/main/imgemb.pt
  #https://huggingface.co/Larvik/imgemb/resolve/main/imgemb.pt
imgemb=torch.jit.load('imgemb.pt').float()
c=imgemb.norm( imgemb.proj_all(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) )

#for justinpinkney's
#c= imgemb.proj(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) 

Some results: https://imgur.com/a/7YvwUmI First one is cond image, other are vars. Seems not work well when non-square (vars are 704x768).

Also I made a nb for free tier 12gb sys ram Colab: https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

Bascially, without meta device trick (https://pytorch.org/torchdistx/latest/fake_tensor.html), Pytorch requires 2x sys ram to load a model and its weights. This one is likely even go 2.5x because it loads vae weights and (textual?) CLIP weights over again, which are already inside that 12gb weight file.

The notebook above extracts weight blobs to disk and just load the needful ones, and made it possible to run on 12gb sys ram colab (can even without colab gpu).

demosch commented 1 year ago

Image variation part only is a 3gb+ unet just as justinpinkney's (I made it into JITs before: https://huggingface.co/Larvik/sd470k_imgemb/tree/main) Load a 12gb weight file for that is not necessary.

Also the cond image will be resized to 224x224 anyway, make that a [1,3,512,512] thing is also not necessary.

A short script to make image cond:

import os
import torch
from PIL import Image
from torchvision import transforms
def load_im(im_path):
    if im_path.startswith("http"):
        response = requests.get(im_path)
        response.raise_for_status()
        im = Image.open(BytesIO(response.content))
    else:
        im = Image.open(im_path).convert("RGB")
    tforms = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop((224, 224)),
        transforms.ToTensor(),
    ])
    inp = tforms(im).unsqueeze(0)
    return inp*2-1
if not os.path.isfile('imgemb.pt'):
  !wget https://huggingface.co/Larvik/imgemb_t1/resolve/main/imgemb.pt
  #https://huggingface.co/Larvik/imgemb/resolve/main/imgemb.pt
imgemb=torch.jit.load('imgemb.pt').float()
c=imgemb.norm( imgemb.proj_all(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) )

#for justinpinkney's
#c= imgemb.proj(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) 

Some results: https://imgur.com/a/7YvwUmI First one is cond image, other are vars. Seems not work well when non-square (vars are 704x768).

Also I made a nb for free tier 12gb sys ram Colab: https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

Bascially, without meta device trick (https://pytorch.org/torchdistx/latest/fake_tensor.html), Pytorch requires 2x sys ram to load a model and its weights. This one is likely even go 2.5x because it loads vae weights and (textual?) CLIP weights over again, which are already inside that 12gb weight file.

The notebook above extracts weight blobs to disk and just load the needful ones, and made it possible to run on 12gb sys ram colab (can even without colab gpu).

hello! i just have to run this code to get variations of given image? anyway to contact you (discord?).

litevex commented 1 year ago

Image variation part only is a 3gb+ unet

Is it possible to split the other unets too? (Image+Text guidance, Image2Text, etc)

xingqian2018 commented 1 year ago

Image variation part only is a 3gb+ unet

Is it possible to split the other unets too? (Image+Text guidance, Image2Text, etc)

Yea, it is possible. You can cut out a subnetwork for text-to-image-only or image-variation-only from either vd-dc or vd-official.

playdasegunda commented 1 year ago

your colab is giving error in cell 3 this is the error: NameError: name 'not_txtALL' is not defined

xingqian2018 commented 1 year ago

your colab is giving error in cell 3 this is the error: NameError: name 'not_txtALL' is not defined

I haven't got a chance to work and release colab, or do I get you wrong?

playdasegunda commented 1 year ago

@xingqian2018 Sorry i meant this is the colab https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

TabuaTambalam commented 1 year ago

@xingqian2018 Sorry i meant this is the colab https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

clikc

Green ones are must-click, replace yellow one with your own image, click the red one when it finished. Also not tested on gpu instances yet, if it breaks on somewhere when on gpu then you must fix by hand.

I expected anyone who use colab can read basic python tho, and know clicking "show code" on a cell it will show the codes.

@demosch Not quite using discord tho, have some posts on LAION sever under name ThugaterDios.

In my main notebook (https://github.com/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_SR_jited.ipynb) both justinpinkney's [1,768] and this [257,768] image-cond-SD are ready for use. It supports custom width height, concating multiple cond images together, all k-samplers .etc, but errr... not quite user friendly.

By now you have to add _imgemb_vrs to the model entries and using pre-compiled binary prompt, to use that [257,768] image-cond weights from vd-official.pth. tbb0 tbb1 binaryprompts.zip

Some results, concating two cond images together (pooh toy and firefox logo), i.e. from a [514,768] image cond: https://imgur.com/a/6KyeyM0

xingqian2018 commented 1 year ago

@lucasbr15 @TabuaTambalam Thanks for sharing. This colab is not created by our team. I think they truncated some code from this Github. We are working to create a colab demo in parallel with the HuggingFace with all supported functions. Please stay tuned.

xingqian2018 commented 1 year ago

@litevxx A new codebase of Versatile Diffusion has been pushed and now you can easy segment out the single flow model you need. (i.e. VD-basic)