Open kkedich opened 7 months ago
Hi kkedich,
I think the link to the pretrained pytorch checkpoint is wrong. The correct one is https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt. It's actually from the official https://github.com/CompVis/latent-diffusion/tree/main?tab=readme-ov-file#text-to-image, just below "Download the pre-trained weights (5.7GB)"
Hi @chao-ji ,
I was trying to convert the model from pretrained txt2img model, but it seems that some shapes are different. I was able to identify for the Transformer model and part of the Unet model, but not entirely.
Example in transformer:
Fix for the transformer model was setting the
hidden_size
andfilter_size
to multiples of 640:But, the Unet model has also a mismatch from the pre-trained model. I adjusted some parts but I am having difficulties to match the exactly weights with the Unet model defined. The initial weights are being matched, but later (at weight 17, for example) the mismatch starts:
sd_weight: (192,), unet_current_model_weight (192, 192)
I was wondering if you already saw this pattern from the pre-trained model or I need to define something else to follow exactly the shapes defined in the
convert_ckpt_pytorch_to_tf2.py
file. I double check the pre-trained model and they are in fact coming with these other shapes. Thanks