Stability-AI / stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
38.83k stars 5.01k forks source link

Would be possible to use another Open Clip arch? #232

Open Mateusmsouza opened 1 year ago

Mateusmsouza commented 1 year ago

I noticed that OpenClip version used is ViT-H-14 laion2b_s32b_b79k by default, I tried to use another version (ViT-B-32 laion2b_s34b_b79k) and I got errors on models weight:

RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight: copying 
a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 512]).        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight: copying

Here is how I changed the config.yaml:

# configs/stable-diffusion/v2-inference-v.yaml
unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        ...
        context_dim: 512 # only change I made from 1024 to 512
        ...

cond_stage_config:
      target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
      params:
        freeze: True
        layer: "penultimate"
        # my changes below
        arch: "ViT-B-32" 
        version: "laion2b_s34b_b79k"