Stability-AI / generative-models

Generative Models by Stability AI
MIT License
23.3k stars 2.57k forks source link

Autoencoder issues #85

Closed pidajay closed 11 months ago

pidajay commented 11 months ago

The published autoencoder weights do not seem to match the model defined in this repo. Specifically when I try to load the weights, the following keys are missing post_quant_conv.bias post_quant_conv.weight quant_conv.bias quant_conv.weight

Also would be really helpful if the actual yaml config used to train the SDXL autoencoder is published (the example in this repo does not seem to correspond to the original).

jenuk commented 11 months ago

I just rechecked the weights, and these keys are contained in the VAE. Can you confirm that you are loading AutoencoderKL, not AutoencodingEngine, and that the output says missing instead of unexpected?

pidajay commented 11 months ago

Thanks for the response. You are right. I was using the AutoencodingEngine. The AutoencoderKL has the keys. However, the example config for autoencoder in this repo does not seem compatible with AutoencoderKL. I get KeyError: 'ddconfig'. What would really help is the actual config used that corresponds to the published weights.

jenuk commented 11 months ago

Yeah you are right, the AutoencoderKL takes slightly different arguments than AutoencoderEngine. The correct settings, can be read from the SDXL-config:

https://github.com/Stability-AI/generative-models/blob/45c443b316737a4ab6e40413d7794a7f5657c19f/configs/inference/sd_xl_base.yaml#L80-L98

If you use that it should be loadable (you will need to adjust whitespace and model: instead of first_stage_config as in the training example). I can add a standalone inference config in the near future. The training examples only contain configs for AutoencodingEngine is that this is the newer version, with more flexibility. But there is no support for the legacy modules post_quant_conv and quant_conv as you found out.

If you want to train with this, be sure to add a lossconfig (see the trainings-config you already found for how to do that).

Let me know, if there are any more problems here.

darshats commented 10 months ago

Hi, I'm trying this exact scenario of retraining the sdxl vae. I've made the changes indicated to create a config file that looks like below so that its pointing to training AutoencoderKL class with the loss config of lpips, and checkpoint of sdxlvae

The problem I run into is that the encode methods on the AutoencoderKL and its parent AutoencoderEngine are different. I'm not sure which one is right one. When traning runs, as part of sanity check, the encode method gets called via:

run_sanity_check-->validation_step-->AutoencodingEngine.forward()-->self.encode(x, return_reg_log=True)

At this point, AutoencoderKL does not have an encode method that takes return_reg_log param. What is the right codepath here to make this match sdxl vae training?

config file

model:
  base_learning_rate: 4.5e-6
  target: sgm.models.autoencoder.AutoencoderKL
  params:
    input_key: jpg
    monitor: val/rec_loss
    embed_dim: 4 
    ckpt_path: "/mnt/d/generative-models/configs/checkpoints/sdxl_vae.safetensors"

    lossconfig:
      target: sgm.modules.autoencoding.losses.GeneralLPIPSWithDiscriminator
      params:
        perceptual_weight: 0.25
        disc_start: 20001
        disc_weight: 0.5
        learn_logvar: True

        regularization_weights:
          kl_loss: 1.0

    regularizer_config:
      target: sgm.modules.autoencoding.regularizers.DiagonalGaussianRegularizer

    ddconfig: 
       attn_type: vanilla-xformers 
       double_z: true 
       z_channels: 4 
       resolution: 256 
       in_channels: 3 
       out_ch: 3 
       ch: 128 
       ch_mult: [1, 2, 4, 4] 
       num_res_blocks: 2 
       attn_resolutions: [] 
       dropout: 0.0 

Thank you!

wtliao commented 8 months ago

@jenuk Hi, thanks for the infor. I am tryting to test the AEkl model by encoding a real image an then reconstruct it. However, my code, does not work: after encoding, everyting is nan. I cannot figure out where is going wrong.

I am using the config file as you posted and the inference code as fellow:

model = instantiate_from_config(config.model)
model.to(device)
model.eval()
transforms = [T.ToTensor(),
                          T.Lambda(lambda x: x * 2. - 1.)]
transforms = T.Compose(transforms)
image = Image.open(fname)
image = transforms(image)
image = image[None,:,:,:]
xrec = model(image.to(device))
result = torch.clamp((xrec + 1.0) / 2.0, min=0.0, max=1.0)
image = Image.fromarray(result.astype(np.uint8))
image.save(osp.join(outpath, osp.basename(fname)))

Is there anything wrong? Thanks in advance!

tuning12 commented 4 months ago

@jenuk Hi, thanks for the infor. I am tryting to test the AEkl model by encoding a real image an then reconstruct it. However, my code, does not work: after encoding, everyting is nan. I cannot figure out where is going wrong.

I am using the config file as you posted and the inference code as fellow:

model = instantiate_from_config(config.model)
model.to(device)
model.eval()
transforms = [T.ToTensor(),
                          T.Lambda(lambda x: x * 2. - 1.)]
transforms = T.Compose(transforms)
image = Image.open(fname)
image = transforms(image)
image = image[None,:,:,:]
xrec = model(image.to(device))
result = torch.clamp((xrec + 1.0) / 2.0, min=0.0, max=1.0)
image = Image.fromarray(result.astype(np.uint8))
image.save(osp.join(outpath, osp.basename(fname)))

Is there anything wrong? Thanks in advance!

I have met the same problem. Have you solved it?