Closed pidajay closed 11 months ago
I just rechecked the weights, and these keys are contained in the VAE. Can you confirm that you are loading AutoencoderKL
, not AutoencodingEngine
, and that the output says missing instead of unexpected?
Thanks for the response. You are right. I was using the AutoencodingEngine
. The AutoencoderKL
has the keys. However, the example config for autoencoder in this repo does not seem compatible with AutoencoderKL
. I get KeyError: 'ddconfig'
. What would really help is the actual config used that corresponds to the published weights.
Yeah you are right, the AutoencoderKL
takes slightly different arguments than AutoencoderEngine
. The correct settings, can be read from the SDXL-config:
If you use that it should be loadable (you will need to adjust whitespace and model:
instead of first_stage_config
as in the training example). I can add a standalone inference config in the near future. The training examples only contain configs for AutoencodingEngine
is that this is the newer version, with more flexibility. But there is no support for the legacy modules post_quant_conv
and quant_conv
as you found out.
If you want to train with this, be sure to add a lossconfig (see the trainings-config you already found for how to do that).
Let me know, if there are any more problems here.
Hi, I'm trying this exact scenario of retraining the sdxl vae. I've made the changes indicated to create a config file that looks like below so that its pointing to training AutoencoderKL class with the loss config of lpips, and checkpoint of sdxlvae
The problem I run into is that the encode methods on the AutoencoderKL and its parent AutoencoderEngine are different. I'm not sure which one is right one. When traning runs, as part of sanity check, the encode method gets called via:
run_sanity_check-->validation_step-->AutoencodingEngine.forward()-->self.encode(x, return_reg_log=True)
At this point, AutoencoderKL does not have an encode method that takes return_reg_log param. What is the right codepath here to make this match sdxl vae training?
config file
model:
base_learning_rate: 4.5e-6
target: sgm.models.autoencoder.AutoencoderKL
params:
input_key: jpg
monitor: val/rec_loss
embed_dim: 4
ckpt_path: "/mnt/d/generative-models/configs/checkpoints/sdxl_vae.safetensors"
lossconfig:
target: sgm.modules.autoencoding.losses.GeneralLPIPSWithDiscriminator
params:
perceptual_weight: 0.25
disc_start: 20001
disc_weight: 0.5
learn_logvar: True
regularization_weights:
kl_loss: 1.0
regularizer_config:
target: sgm.modules.autoencoding.regularizers.DiagonalGaussianRegularizer
ddconfig:
attn_type: vanilla-xformers
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult: [1, 2, 4, 4]
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
Thank you!
@jenuk Hi, thanks for the infor. I am tryting to test the AEkl model by encoding a real image an then reconstruct it. However, my code, does not work: after encoding, everyting is nan. I cannot figure out where is going wrong.
I am using the config file as you posted and the inference code as fellow:
model = instantiate_from_config(config.model)
model.to(device)
model.eval()
transforms = [T.ToTensor(),
T.Lambda(lambda x: x * 2. - 1.)]
transforms = T.Compose(transforms)
image = Image.open(fname)
image = transforms(image)
image = image[None,:,:,:]
xrec = model(image.to(device))
result = torch.clamp((xrec + 1.0) / 2.0, min=0.0, max=1.0)
image = Image.fromarray(result.astype(np.uint8))
image.save(osp.join(outpath, osp.basename(fname)))
Is there anything wrong? Thanks in advance!
@jenuk Hi, thanks for the infor. I am tryting to test the AEkl model by encoding a real image an then reconstruct it. However, my code, does not work: after encoding, everyting is nan. I cannot figure out where is going wrong.
I am using the config file as you posted and the inference code as fellow:
model = instantiate_from_config(config.model) model.to(device) model.eval() transforms = [T.ToTensor(), T.Lambda(lambda x: x * 2. - 1.)] transforms = T.Compose(transforms) image = Image.open(fname) image = transforms(image) image = image[None,:,:,:] xrec = model(image.to(device)) result = torch.clamp((xrec + 1.0) / 2.0, min=0.0, max=1.0) image = Image.fromarray(result.astype(np.uint8)) image.save(osp.join(outpath, osp.basename(fname)))
Is there anything wrong? Thanks in advance!
I have met the same problem. Have you solved it?
The published autoencoder weights do not seem to match the model defined in this repo. Specifically when I try to load the weights, the following keys are missing post_quant_conv.bias post_quant_conv.weight quant_conv.bias quant_conv.weight
Also would be really helpful if the actual yaml config used to train the SDXL autoencoder is published (the example in this repo does not seem to correspond to the original).