CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
66.54k stars 9.97k forks source link

semantic-driven image synthesis failed #240

Open HUuxiaobin opened 1 year ago

HUuxiaobin commented 1 year ago

Dear author,

Thanks so much for your great contribution in image synthesis community. Inspired by your semantic-to-image demo in figure 8 of paper, I would like to reshow the figure 8 results to demonstrate the effectiveness of semantic-to-image synthesis. But I failed to get the realistic image from the semantic when I use the zoom-in semantic image of figure 8 and the prompt "A fantasy lake under the sunset'' with the checkpoint of stable-diffusion-v1-1. I am wondering if you could help me to fix this problem and then enlarge your great influence in segmentation domain. Have a nice day!

best TU Munich

KostyaAtarik commented 1 year ago

I'm not sure, but have you tried the models semantic_synthesis512 and semantic_synthesis256 from https://github.com/CompVis/stable-diffusion/blob/main/scripts/download_models.sh ?

HUuxiaobin commented 1 year ago

Thanks for your quick reply. I run the segmentation-to-image script 'python scripts/img2img.py --prompt "A fantasy lake under the sunset" --init-img ./mask/case2.png --strength 0.8' and the models are from semantic_synthesis256 model and a 512 model. The error occurred as follows: m, u = model.load_state_dict(sd, strict=False) File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LatentDiffusion: size mismatch for model.diffusion_model.time_embed.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([1280, 320]). size mismatch for model.diffusion_model.time_embed.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.time_embed.2.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.time_embed.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([128, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]). size mismatch for model.diffusion_model.input_blocks.0.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([320, 1280]). size mismatch for model.diffusion_model.input_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([320, 1280]). size mismatch for model.diffusion_model.input_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.3.0.op.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.3.0.op.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([512, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for model.diffusion_model.input_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.input_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([512, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 320, 1, 1]). size mismatch for model.diffusion_model.input_blocks.4.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.input_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for model.diffusion_model.input_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.input_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.6.0.op.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.input_blocks.6.0.op.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 640, 3, 3]). size mismatch for model.diffusion_model.input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([1024, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 640, 1, 1]). size mismatch for model.diffusion_model.input_blocks.7.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.input_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.input_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.input_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.middle_block.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.middle_block.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.middle_block.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.middle_block.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.in_layers.2.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.middle_block.2.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.middle_block.2.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.middle_block.2.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.0.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.0.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for model.diffusion_model.output_blocks.0.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.0.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.0.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for model.diffusion_model.output_blocks.0.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for model.diffusion_model.output_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for model.diffusion_model.output_blocks.1.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 1536, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for model.diffusion_model.output_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([1024, 1536, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for model.diffusion_model.output_blocks.2.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.1.conv.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.2.1.conv.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1536, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for model.diffusion_model.output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.skip_connection.weight: copying a param with shape torch.Size([512, 1536, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for model.diffusion_model.output_blocks.3.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for model.diffusion_model.output_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for model.diffusion_model.output_blocks.4.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for model.diffusion_model.output_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for model.diffusion_model.output_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([512, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 3, 3]). size mismatch for model.diffusion_model.output_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.skip_connection.weight: copying a param with shape torch.Size([512, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 1, 1]). size mismatch for model.diffusion_model.output_blocks.5.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for model.diffusion_model.output_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for model.diffusion_model.output_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([128, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1920, 3, 3]). size mismatch for model.diffusion_model.output_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for model.diffusion_model.output_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.output_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.6.0.skip_connection.weight: copying a param with shape torch.Size([128, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1920, 1, 1]). size mismatch for model.diffusion_model.output_blocks.6.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for model.diffusion_model.output_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.output_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.7.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([960]). size mismatch for model.diffusion_model.output_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([960]). size mismatch for model.diffusion_model.output_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 960, 3, 3]). size mismatch for model.diffusion_model.output_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for model.diffusion_model.output_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.output_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.skip_connection.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 960, 1, 1]). size mismatch for model.diffusion_model.output_blocks.8.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.out.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.out.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.out.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([4, 320, 3, 3]). size mismatch for model.diffusion_model.out.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]). size mismatch for first_stage_model.encoder.conv_out.weight: copying a param with shape torch.Size([3, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 512, 3, 3]). size mismatch for first_stage_model.encoder.conv_out.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for first_stage_model.decoder.conv_in.weight: copying a param with shape torch.Size([512, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 4, 3, 3]). size mismatch for first_stage_model.quant_conv.weight: copying a param with shape torch.Size([3, 3, 1, 1]) from checkpoint, the shape in current model is torch.Size([8, 8, 1, 1]). size mismatch for first_stage_model.quant_conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for first_stage_model.post_quant_conv.weight: copying a param with shape torch.Size([3, 3, 1, 1]) from checkpoint, the shape in current model is torch.Size([4, 4, 1, 1]). size mismatch for first_stage_model.post_quant_conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).

Is this problem of first-stage-model? which model should I use in first stage ?

wget -O models/first_stage_models/kl-f4/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f4.zip wget -O models/first_stage_models/kl-f8/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f8.zip wget -O models/first_stage_models/kl-f16/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f16.zip wget -O models/first_stage_models/kl-f32/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f32.zip wget -O models/first_stage_models/vq-f4/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f4.zip wget -O models/first_stage_models/vq-f4-noattn/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f4-noattn.zip wget -O models/first_stage_models/vq-f8/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f8.zip wget -O models/first_stage_models/vq-f8-n256/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip wget -O models/first_stage_models/vq-f16/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f16.zip

thanks very much for your help. Have a nice day!

best regards

bohnedan commented 1 year ago

@HUuxiaobin I'm running into the very same error. Did you accomplish to run the model checkpoint for semantic synthesis? I'm currently figuring, that the concatenation and torch shape need to be aligned with the UNet structure. I think therefore the semantic map needs to be encoded to match the latent image representation (probably 320²px in the downsized version according to the error, should be downsized by 4). Is this encoding process available anywhere? I only found it within inpaint.py. Besides that by reading the paper I'd say that the vq-f4 should be the correct first stage for the semantic synthesis.

Hey-Sweety commented 1 month ago

Hello, I am also working on the task of semantic image synthesis. ⭐May I ask which file should we run? Is it img2img. py? ⭐And how should the variable "config" be set to achieve the task of semantic synthesis? Should we use the yaml file in the configs folder or the yaml file in the models folder? Thank you so much and looking forward to your reply.❤