Open kxqt opened 9 months ago
我使用preprocess.ipynb脚本处理checkpoint之后,加载出现维度不匹配的情况。
RuntimeError: Error(s) in loading state_dict for DiffusionEngine:
size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 2048]).
size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 2048]).
size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 2048]).
size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 2048]).
size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 2048]).
size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 2048]).
size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 2048]).
size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 2048]).
size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 2048]).
size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.t_attn.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 2048]).
size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.t_attn.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 2048]).
另外,请问是用8张A40在SynthText数据上训练100k iter需要48h吗?在LAION-OCR数据上训练100k iter需要另外48h? 我还发现paper里面的batchsize是64,而config/train.yaml里面似乎是单卡batchsize为64,所以如果我想复现您的结果,需要改成总batchsize为64吗? 感谢!
在训练一开始加载512-inpainting-ema.ckpt时,发现pertrained weight和model很多权重没有成功加载。请问这个是正常的吗?
Restored from ./checkpoints/pretrained/512-inpainting-ema.ckpt with 508 missing and 420 unexpected keys Missing Keys: ['model.diffusion_model.input_blocks.1.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.middle_block.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.middle_block.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.middle_block.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.middle_block.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.middle_block.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.middle_block.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.middle_block.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.t_norm.bias', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.t_attn.to_q.weight', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.t_attn.to_k.weight', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.t_attn.to_v.weight', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.t_attn.to_out.0.weight', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.t_attn.to_out.0.bias', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.t_norm.weight', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.t_norm.bias', 'denoiser.sigmas', 'conditioner.embedders.0.label_embedding.weight', 'conditioner.embedders.0.pos_embedding.pe', 'conditioner.embedders.0.encoder.layers.0.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.0.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.0.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.0.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.0.linear1.weight', 'conditioner.embedders.0.encoder.layers.0.linear1.bias', 'conditioner.embedders.0.encoder.layers.0.linear2.weight', 'conditioner.embedders.0.encoder.layers.0.linear2.bias', 'conditioner.embedders.0.encoder.layers.0.norm1.weight', 'conditioner.embedders.0.encoder.layers.0.norm1.bias', 'conditioner.embedders.0.encoder.layers.0.norm2.weight', 'conditioner.embedders.0.encoder.layers.0.norm2.bias', 'conditioner.embedders.0.encoder.layers.1.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.1.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.1.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.1.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.1.linear1.weight', 'conditioner.embedders.0.encoder.layers.1.linear1.bias', 'conditioner.embedders.0.encoder.layers.1.linear2.weight', 'conditioner.embedders.0.encoder.layers.1.linear2.bias', 'conditioner.embedders.0.encoder.layers.1.norm1.weight', 'conditioner.embedders.0.encoder.layers.1.norm1.bias', 'conditioner.embedders.0.encoder.layers.1.norm2.weight', 'conditioner.embedders.0.encoder.layers.1.norm2.bias', 'conditioner.embedders.0.encoder.layers.2.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.2.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.2.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.2.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.2.linear1.weight', 'conditioner.embedders.0.encoder.layers.2.linear1.bias', 'conditioner.embedders.0.encoder.layers.2.linear2.weight', 'conditioner.embedders.0.encoder.layers.2.linear2.bias', 'conditioner.embedders.0.encoder.layers.2.norm1.weight', 'conditioner.embedders.0.encoder.layers.2.norm1.bias', 'conditioner.embedders.0.encoder.layers.2.norm2.weight', 'conditioner.embedders.0.encoder.layers.2.norm2.bias', 'conditioner.embedders.0.encoder.layers.3.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.3.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.3.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.3.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.3.linear1.weight', 'conditioner.embedders.0.encoder.layers.3.linear1.bias', 'conditioner.embedders.0.encoder.layers.3.linear2.weight', 'conditioner.embedders.0.encoder.layers.3.linear2.bias', 'conditioner.embedders.0.encoder.layers.3.norm1.weight', 'conditioner.embedders.0.encoder.layers.3.norm1.bias', 'conditioner.embedders.0.encoder.layers.3.norm2.weight', 'conditioner.embedders.0.encoder.layers.3.norm2.bias', 'conditioner.embedders.0.encoder.layers.4.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.4.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.4.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.4.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.4.linear1.weight', 'conditioner.embedders.0.encoder.layers.4.linear1.bias', 'conditioner.embedders.0.encoder.layers.4.linear2.weight', 'conditioner.embedders.0.encoder.layers.4.linear2.bias', 'conditioner.embedders.0.encoder.layers.4.norm1.weight', 'conditioner.embedders.0.encoder.layers.4.norm1.bias', 'conditioner.embedders.0.encoder.layers.4.norm2.weight', 'conditioner.embedders.0.encoder.layers.4.norm2.bias', 'conditioner.embedders.0.encoder.layers.5.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.5.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.5.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.5.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.5.linear1.weight', 'conditioner.embedders.0.encoder.layers.5.linear1.bias', 'conditioner.embedders.0.encoder.layers.5.linear2.weight', 'conditioner.embedders.0.encoder.layers.5.linear2.bias', 'conditioner.embedders.0.encoder.layers.5.norm1.weight', 'conditioner.embedders.0.encoder.layers.5.norm1.bias', 'conditioner.embedders.0.encoder.layers.5.norm2.weight', 'conditioner.embedders.0.encoder.layers.5.norm2.bias', 'conditioner.embedders.0.encoder.layers.6.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.6.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.6.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.6.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.6.linear1.weight', 'conditioner.embedders.0.encoder.layers.6.linear1.bias', 'conditioner.embedders.0.encoder.layers.6.linear2.weight', 'conditioner.embedders.0.encoder.layers.6.linear2.bias', 'conditioner.embedders.0.encoder.layers.6.norm1.weight', 'conditioner.embedders.0.encoder.layers.6.norm1.bias', 'conditioner.embedders.0.encoder.layers.6.norm2.weight', 'conditioner.embedders.0.encoder.layers.6.norm2.bias', 'conditioner.embedders.0.encoder.layers.7.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.7.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.7.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.7.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.7.linear1.weight', 'conditioner.embedders.0.encoder.layers.7.linear1.bias', 'conditioner.embedders.0.encoder.layers.7.linear2.weight', 'conditioner.embedders.0.encoder.layers.7.linear2.bias', 'conditioner.embedders.0.encoder.layers.7.norm1.weight', 'conditioner.embedders.0.encoder.layers.7.norm1.bias', 'conditioner.embedders.0.encoder.layers.7.norm2.weight', 'conditioner.embedders.0.encoder.layers.7.norm2.bias', 'conditioner.embedders.0.encoder.layers.8.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.8.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.8.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.8.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.8.linear1.weight', 'conditioner.embedders.0.encoder.layers.8.linear1.bias', 'conditioner.embedders.0.encoder.layers.8.linear2.weight', 'conditioner.embedders.0.encoder.layers.8.linear2.bias', 'conditioner.embedders.0.encoder.layers.8.norm1.weight', 'conditioner.embedders.0.encoder.layers.8.norm1.bias', 'conditioner.embedders.0.encoder.layers.8.norm2.weight', 'conditioner.embedders.0.encoder.layers.8.norm2.bias', 'conditioner.embedders.0.encoder.layers.9.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.9.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.9.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.9.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.9.linear1.weight', 'conditioner.embedders.0.encoder.layers.9.linear1.bias', 'conditioner.embedders.0.encoder.layers.9.linear2.weight', 'conditioner.embedders.0.encoder.layers.9.linear2.bias', 'conditioner.embedders.0.encoder.layers.9.norm1.weight', 'conditioner.embedders.0.encoder.layers.9.norm1.bias', 'conditioner.embedders.0.encoder.layers.9.norm2.weight', 'conditioner.embedders.0.encoder.layers.9.norm2.bias', 'conditioner.embedders.0.encoder.layers.10.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.10.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.10.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.10.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.10.linear1.weight', 'conditioner.embedders.0.encoder.layers.10.linear1.bias', 'conditioner.embedders.0.encoder.layers.10.linear2.weight', 'conditioner.embedders.0.encoder.layers.10.linear2.bias', 'conditioner.embedders.0.encoder.layers.10.norm1.weight', 'conditioner.embedders.0.encoder.layers.10.norm1.bias', 'conditioner.embedders.0.encoder.layers.10.norm2.weight', 'conditioner.embedders.0.encoder.layers.10.norm2.bias', 'conditioner.embedders.0.encoder.layers.11.self_attn.in_proj_weight', 'conditioner.embedders.0.encoder.layers.11.self_attn.in_proj_bias', 'conditioner.embedders.0.encoder.layers.11.self_attn.out_proj.weight', 'conditioner.embedders.0.encoder.layers.11.self_attn.out_proj.bias', 'conditioner.embedders.0.encoder.layers.11.linear1.weight', 'conditioner.embedders.0.encoder.layers.11.linear1.bias', 'conditioner.embedders.0.encoder.layers.11.linear2.weight', 'conditioner.embedders.0.encoder.layers.11.linear2.bias', 'conditioner.embedders.0.encoder.layers.11.norm1.weight', 'conditioner.embedders.0.encoder.layers.11.norm1.bias', 'conditioner.embedders.0.encoder.layers.11.norm2.weight', 'conditioner.embedders.0.encoder.layers.11.norm2.bias', 'conditioner.embedders.2.model.encoder.conv_in.weight', 'conditioner.embedders.2.model.encoder.conv_in.bias', 'conditioner.embedders.2.model.encoder.down.0.block.0.norm1.weight', 'conditioner.embedders.2.model.encoder.down.0.block.0.norm1.bias', 'conditioner.embedders.2.model.encoder.down.0.block.0.conv1.weight', 'conditioner.embedders.2.model.encoder.down.0.block.0.conv1.bias', 'conditioner.embedders.2.model.encoder.down.0.block.0.norm2.weight', 'conditioner.embedders.2.model.encoder.down.0.block.0.norm2.bias', 'conditioner.embedders.2.model.encoder.down.0.block.0.conv2.weight', 'conditioner.embedders.2.model.encoder.down.0.block.0.conv2.bias', 'conditioner.embedders.2.model.encoder.down.0.block.1.norm1.weight', 'conditioner.embedders.2.model.encoder.down.0.block.1.norm1.bias', 'conditioner.embedders.2.model.encoder.down.0.block.1.conv1.weight', 'conditioner.embedders.2.model.encoder.down.0.block.1.conv1.bias', 'conditioner.embedders.2.model.encoder.down.0.block.1.norm2.weight', 'conditioner.embedders.2.model.encoder.down.0.block.1.norm2.bias', 'conditioner.embedders.2.model.encoder.down.0.block.1.conv2.weight', 'conditioner.embedders.2.model.encoder.down.0.block.1.conv2.bias', 'conditioner.embedders.2.model.encoder.down.0.downsample.conv.weight', 'conditioner.embedders.2.model.encoder.down.0.downsample.conv.bias', 'conditioner.embedders.2.model.encoder.down.1.block.0.norm1.weight', 'conditioner.embedders.2.model.encoder.down.1.block.0.norm1.bias', 'conditioner.embedders.2.model.encoder.down.1.block.0.conv1.weight', 'conditioner.embedders.2.model.encoder.down.1.block.0.conv1.bias', 'conditioner.embedders.2.model.encoder.down.1.block.0.norm2.weight', 'conditioner.embedders.2.model.encoder.down.1.block.0.norm2.bias', 'conditioner.embedders.2.model.encoder.down.1.block.0.conv2.weight', 'conditioner.embedders.2.model.encoder.down.1.block.0.conv2.bias', 'conditioner.embedders.2.model.encoder.down.1.block.0.nin_shortcut.weight', 'conditioner.embedders.2.model.encoder.down.1.block.0.nin_shortcut.bias', 'conditioner.embedders.2.model.encoder.down.1.block.1.norm1.weight', 'conditioner.embedders.2.model.encoder.down.1.block.1.norm1.bias', 'conditioner.embedders.2.model.encoder.down.1.block.1.conv1.weight', 'conditioner.embedders.2.model.encoder.down.1.block.1.conv1.bias', 'conditioner.embedders.2.model.encoder.down.1.block.1.norm2.weight', 'conditioner.embedders.2.model.encoder.down.1.block.1.norm2.bias', 'conditioner.embedders.2.model.encoder.down.1.block.1.conv2.weight', 'conditioner.embedders.2.model.encoder.down.1.block.1.conv2.bias', 'conditioner.embedders.2.model.encoder.down.1.downsample.conv.weight', 'conditioner.embedders.2.model.encoder.down.1.downsample.conv.bias', 'conditioner.embedders.2.model.encoder.down.2.block.0.norm1.weight', 'conditioner.embedders.2.model.encoder.down.2.block.0.norm1.bias', 'conditioner.embedders.2.model.encoder.down.2.block.0.conv1.weight', 'conditioner.embedders.2.model.encoder.down.2.block.0.conv1.bias', 'conditioner.embedders.2.model.encoder.down.2.block.0.norm2.weight', 'conditioner.embedders.2.model.encoder.down.2.block.0.norm2.bias', 'conditioner.embedders.2.model.encoder.down.2.block.0.conv2.weight', 'conditioner.embedders.2.model.encoder.down.2.block.0.conv2.bias', 'conditioner.embedders.2.model.encoder.down.2.block.0.nin_shortcut.weight', 'conditioner.embedders.2.model.encoder.down.2.block.0.nin_shortcut.bias', 'conditioner.embedders.2.model.encoder.down.2.block.1.norm1.weight', 'conditioner.embedders.2.model.encoder.down.2.block.1.norm1.bias', 'conditioner.embedders.2.model.encoder.down.2.block.1.conv1.weight', 'conditioner.embedders.2.model.encoder.down.2.block.1.conv1.bias', 'conditioner.embedders.2.model.encoder.down.2.block.1.norm2.weight', 'conditioner.embedders.2.model.encoder.down.2.block.1.norm2.bias', 'conditioner.embedders.2.model.encoder.down.2.block.1.conv2.weight', 'conditioner.embedders.2.model.encoder.down.2.block.1.conv2.bias', 'conditioner.embedders.2.model.encoder.down.2.downsample.conv.weight', 'conditioner.embedders.2.model.encoder.down.2.downsample.conv.bias', 'conditioner.embedders.2.model.encoder.down.3.block.0.norm1.weight', 'conditioner.embedders.2.model.encoder.down.3.block.0.norm1.bias', 'conditioner.embedders.2.model.encoder.down.3.block.0.conv1.weight', 'conditioner.embedders.2.model.encoder.down.3.block.0.conv1.bias', 'conditioner.embedders.2.model.encoder.down.3.block.0.norm2.weight', 'conditioner.embedders.2.model.encoder.down.3.block.0.norm2.bias', 'conditioner.embedders.2.model.encoder.down.3.block.0.conv2.weight', 'conditioner.embedders.2.model.encoder.down.3.block.0.conv2.bias', 'conditioner.embedders.2.model.encoder.down.3.block.1.norm1.weight', 'conditioner.embedders.2.model.encoder.down.3.block.1.norm1.bias', 'conditioner.embedders.2.model.encoder.down.3.block.1.conv1.weight', 'conditioner.embedders.2.model.encoder.down.3.block.1.conv1.bias', 'conditioner.embedders.2.model.encoder.down.3.block.1.norm2.weight', 'conditioner.embedders.2.model.encoder.down.3.block.1.norm2.bias', 'conditioner.embedders.2.model.encoder.down.3.block.1.conv2.weight', 'conditioner.embedders.2.model.encoder.down.3.block.1.conv2.bias', 'conditioner.embedders.2.model.encoder.mid.block_1.norm1.weight', 'conditioner.embedders.2.model.encoder.mid.block_1.norm1.bias', 'conditioner.embedders.2.model.encoder.mid.block_1.conv1.weight', 'conditioner.embedders.2.model.encoder.mid.block_1.conv1.bias', 'conditioner.embedders.2.model.encoder.mid.block_1.norm2.weight', 'conditioner.embedders.2.model.encoder.mid.block_1.norm2.bias', 'conditioner.embedders.2.model.encoder.mid.block_1.conv2.weight', 'conditioner.embedders.2.model.encoder.mid.block_1.conv2.bias', 'conditioner.embedders.2.model.encoder.mid.attn_1.norm.weight', 'conditioner.embedders.2.model.encoder.mid.attn_1.norm.bias', 'conditioner.embedders.2.model.encoder.mid.attn_1.q.weight', 'conditioner.embedders.2.model.encoder.mid.attn_1.q.bias', 'conditioner.embedders.2.model.encoder.mid.attn_1.k.weight', 'conditioner.embedders.2.model.encoder.mid.attn_1.k.bias', 'conditioner.embedders.2.model.encoder.mid.attn_1.v.weight', 'conditioner.embedders.2.model.encoder.mid.attn_1.v.bias', 'conditioner.embedders.2.model.encoder.mid.attn_1.proj_out.weight', 'conditioner.embedders.2.model.encoder.mid.attn_1.proj_out.bias', 'conditioner.embedders.2.model.encoder.mid.block_2.norm1.weight', 'conditioner.embedders.2.model.encoder.mid.block_2.norm1.bias', 'conditioner.embedders.2.model.encoder.mid.block_2.conv1.weight', 'conditioner.embedders.2.model.encoder.mid.block_2.conv1.bias', 'conditioner.embedders.2.model.encoder.mid.block_2.norm2.weight', 'conditioner.embedders.2.model.encoder.mid.block_2.norm2.bias', 'conditioner.embedders.2.model.encoder.mid.block_2.conv2.weight', 'conditioner.embedders.2.model.encoder.mid.block_2.conv2.bias', 'conditioner.embedders.2.model.encoder.norm_out.weight', 'conditioner.embedders.2.model.encoder.norm_out.bias', 'conditioner.embedders.2.model.encoder.conv_out.weight', 'conditioner.embedders.2.model.encoder.conv_out.bias', 'conditioner.embedders.2.model.decoder.conv_in.weight', 'conditioner.embedders.2.model.decoder.conv_in.bias', 'conditioner.embedders.2.model.decoder.mid.block_1.norm1.weight', 'conditioner.embedders.2.model.decoder.mid.block_1.norm1.bias', 'conditioner.embedders.2.model.decoder.mid.block_1.conv1.weight', 'conditioner.embedders.2.model.decoder.mid.block_1.conv1.bias', 'conditioner.embedders.2.model.decoder.mid.block_1.norm2.weight', 'conditioner.embedders.2.model.decoder.mid.block_1.norm2.bias', 'conditioner.embedders.2.model.decoder.mid.block_1.conv2.weight', 'conditioner.embedders.2.model.decoder.mid.block_1.conv2.bias', 'conditioner.embedders.2.model.decoder.mid.attn_1.norm.weight', 'conditioner.embedders.2.model.decoder.mid.attn_1.norm.bias', 'conditioner.embedders.2.model.decoder.mid.attn_1.q.weight', 'conditioner.embedders.2.model.decoder.mid.attn_1.q.bias', 'conditioner.embedders.2.model.decoder.mid.attn_1.k.weight', 'conditioner.embedders.2.model.decoder.mid.attn_1.k.bias', 'conditioner.embedders.2.model.decoder.mid.attn_1.v.weight', 'conditioner.embedders.2.model.decoder.mid.attn_1.v.bias', 'conditioner.embedders.2.model.decoder.mid.attn_1.proj_out.weight', 'conditioner.embedders.2.model.decoder.mid.attn_1.proj_out.bias', 'conditioner.embedders.2.model.decoder.mid.block_2.norm1.weight', 'conditioner.embedders.2.model.decoder.mid.block_2.norm1.bias', 'conditioner.embedders.2.model.decoder.mid.block_2.conv1.weight', 'conditioner.embedders.2.model.decoder.mid.block_2.conv1.bias', 'conditioner.embedders.2.model.decoder.mid.block_2.norm2.weight', 'conditioner.embedders.2.model.decoder.mid.block_2.norm2.bias', 'conditioner.embedders.2.model.decoder.mid.block_2.conv2.weight', 'conditioner.embedders.2.model.decoder.mid.block_2.conv2.bias', 'conditioner.embedders.2.model.decoder.up.0.block.0.norm1.weight', 'conditioner.embedders.2.model.decoder.up.0.block.0.norm1.bias', 'conditioner.embedders.2.model.decoder.up.0.block.0.conv1.weight', 'conditioner.embedders.2.model.decoder.up.0.block.0.conv1.bias', 'conditioner.embedders.2.model.decoder.up.0.block.0.norm2.weight', 'conditioner.embedders.2.model.decoder.up.0.block.0.norm2.bias', 'conditioner.embedders.2.model.decoder.up.0.block.0.conv2.weight', 'conditioner.embedders.2.model.decoder.up.0.block.0.conv2.bias', 'conditioner.embedders.2.model.decoder.up.0.block.0.nin_shortcut.weight', 'conditioner.embedders.2.model.decoder.up.0.block.0.nin_shortcut.bias', 'conditioner.embedders.2.model.decoder.up.0.block.1.norm1.weight', 'conditioner.embedders.2.model.decoder.up.0.block.1.norm1.bias', 'conditioner.embedders.2.model.decoder.up.0.block.1.conv1.weight', 'conditioner.embedders.2.model.decoder.up.0.block.1.conv1.bias', 'conditioner.embedders.2.model.decoder.up.0.block.1.norm2.weight', 'conditioner.embedders.2.model.decoder.up.0.block.1.norm2.bias', 'conditioner.embedders.2.model.decoder.up.0.block.1.conv2.weight', 'conditioner.embedders.2.model.decoder.up.0.block.1.conv2.bias', 'conditioner.embedders.2.model.decoder.up.0.block.2.norm1.weight', 'conditioner.embedders.2.model.decoder.up.0.block.2.norm1.bias', 'conditioner.embedders.2.model.decoder.up.0.block.2.conv1.weight', 'conditioner.embedders.2.model.decoder.up.0.block.2.conv1.bias', 'conditioner.embedders.2.model.decoder.up.0.block.2.norm2.weight', 'conditioner.embedders.2.model.decoder.up.0.block.2.norm2.bias', 'conditioner.embedders.2.model.decoder.up.0.block.2.conv2.weight', 'conditioner.embedders.2.model.decoder.up.0.block.2.conv2.bias', 'conditioner.embedders.2.model.decoder.up.1.block.0.norm1.weight', 'conditioner.embedders.2.model.decoder.up.1.block.0.norm1.bias', 'conditioner.embedders.2.model.decoder.up.1.block.0.conv1.weight', 'conditioner.embedders.2.model.decoder.up.1.block.0.conv1.bias', 'conditioner.embedders.2.model.decoder.up.1.block.0.norm2.weight', 'conditioner.embedders.2.model.decoder.up.1.block.0.norm2.bias', 'conditioner.embedders.2.model.decoder.up.1.block.0.conv2.weight', 'conditioner.embedders.2.model.decoder.up.1.block.0.conv2.bias', 'conditioner.embedders.2.model.decoder.up.1.block.0.nin_shortcut.weight', 'conditioner.embedders.2.model.decoder.up.1.block.0.nin_shortcut.bias', 'conditioner.embedders.2.model.decoder.up.1.block.1.norm1.weight', 'conditioner.embedders.2.model.decoder.up.1.block.1.norm1.bias', 'conditioner.embedders.2.model.decoder.up.1.block.1.conv1.weight', 'conditioner.embedders.2.model.decoder.up.1.block.1.conv1.bias', 'conditioner.embedders.2.model.decoder.up.1.block.1.norm2.weight', 'conditioner.embedders.2.model.decoder.up.1.block.1.norm2.bias', 'conditioner.embedders.2.model.decoder.up.1.block.1.conv2.weight', 'conditioner.embedders.2.model.decoder.up.1.block.1.conv2.bias', 'conditioner.embedders.2.model.decoder.up.1.block.2.norm1.weight', 'conditioner.embedders.2.model.decoder.up.1.block.2.norm1.bias', 'conditioner.embedders.2.model.decoder.up.1.block.2.conv1.weight', 'conditioner.embedders.2.model.decoder.up.1.block.2.conv1.bias', 'conditioner.embedders.2.model.decoder.up.1.block.2.norm2.weight', 'conditioner.embedders.2.model.decoder.up.1.block.2.norm2.bias', 'conditioner.embedders.2.model.decoder.up.1.block.2.conv2.weight', 'conditioner.embedders.2.model.decoder.up.1.block.2.conv2.bias', 'conditioner.embedders.2.model.decoder.up.1.upsample.conv.weight', 'conditioner.embedders.2.model.decoder.up.1.upsample.conv.bias', 'conditioner.embedders.2.model.decoder.up.2.block.0.norm1.weight', 'conditioner.embedders.2.model.decoder.up.2.block.0.norm1.bias', 'conditioner.embedders.2.model.decoder.up.2.block.0.conv1.weight', 'conditioner.embedders.2.model.decoder.up.2.block.0.conv1.bias', 'conditioner.embedders.2.model.decoder.up.2.block.0.norm2.weight', 'conditioner.embedders.2.model.decoder.up.2.block.0.norm2.bias', 'conditioner.embedders.2.model.decoder.up.2.block.0.conv2.weight', 'conditioner.embedders.2.model.decoder.up.2.block.0.conv2.bias', 'conditioner.embedders.2.model.decoder.up.2.block.1.norm1.weight', 'conditioner.embedders.2.model.decoder.up.2.block.1.norm1.bias', 'conditioner.embedders.2.model.decoder.up.2.block.1.conv1.weight', 'conditioner.embedders.2.model.decoder.up.2.block.1.conv1.bias', 'conditioner.embedders.2.model.decoder.up.2.block.1.norm2.weight', 'conditioner.embedders.2.model.decoder.up.2.block.1.norm2.bias', 'conditioner.embedders.2.model.decoder.up.2.block.1.conv2.weight', 'conditioner.embedders.2.model.decoder.up.2.block.1.conv2.bias', 'conditioner.embedders.2.model.decoder.up.2.block.2.norm1.weight', 'conditioner.embedders.2.model.decoder.up.2.block.2.norm1.bias', 'conditioner.embedders.2.model.decoder.up.2.block.2.conv1.weight', 'conditioner.embedders.2.model.decoder.up.2.block.2.conv1.bias', 'conditioner.embedders.2.model.decoder.up.2.block.2.norm2.weight', 'conditioner.embedders.2.model.decoder.up.2.block.2.norm2.bias', 'conditioner.embedders.2.model.decoder.up.2.block.2.conv2.weight', 'conditioner.embedders.2.model.decoder.up.2.block.2.conv2.bias', 'conditioner.embedders.2.model.decoder.up.2.upsample.conv.weight', 'conditioner.embedders.2.model.decoder.up.2.upsample.conv.bias', 'conditioner.embedders.2.model.decoder.up.3.block.0.norm1.weight', 'conditioner.embedders.2.model.decoder.up.3.block.0.norm1.bias', 'conditioner.embedders.2.model.decoder.up.3.block.0.conv1.weight', 'conditioner.embedders.2.model.decoder.up.3.block.0.conv1.bias', 'conditioner.embedders.2.model.decoder.up.3.block.0.norm2.weight', 'conditioner.embedders.2.model.decoder.up.3.block.0.norm2.bias', 'conditioner.embedders.2.model.decoder.up.3.block.0.conv2.weight', 'conditioner.embedders.2.model.decoder.up.3.block.0.conv2.bias', 'conditioner.embedders.2.model.decoder.up.3.block.1.norm1.weight', 'conditioner.embedders.2.model.decoder.up.3.block.1.norm1.bias', 'conditioner.embedders.2.model.decoder.up.3.block.1.conv1.weight', 'conditioner.embedders.2.model.decoder.up.3.block.1.conv1.bias', 'conditioner.embedders.2.model.decoder.up.3.block.1.norm2.weight', 'conditioner.embedders.2.model.decoder.up.3.block.1.norm2.bias', 'conditioner.embedders.2.model.decoder.up.3.block.1.conv2.weight', 'conditioner.embedders.2.model.decoder.up.3.block.1.conv2.bias', 'conditioner.embedders.2.model.decoder.up.3.block.2.norm1.weight', 'conditioner.embedders.2.model.decoder.up.3.block.2.norm1.bias', 'conditioner.embedders.2.model.decoder.up.3.block.2.conv1.weight', 'conditioner.embedders.2.model.decoder.up.3.block.2.conv1.bias', 'conditioner.embedders.2.model.decoder.up.3.block.2.norm2.weight', 'conditioner.embedders.2.model.decoder.up.3.block.2.norm2.bias', 'conditioner.embedders.2.model.decoder.up.3.block.2.conv2.weight', 'conditioner.embedders.2.model.decoder.up.3.block.2.conv2.bias', 'conditioner.embedders.2.model.decoder.up.3.upsample.conv.weight', 'conditioner.embedders.2.model.decoder.up.3.upsample.conv.bias', 'conditioner.embedders.2.model.decoder.norm_out.weight', 'conditioner.embedders.2.model.decoder.norm_out.bias', 'conditioner.embedders.2.model.decoder.conv_out.weight', 'conditioner.embedders.2.model.decoder.conv_out.bias', 'conditioner.embedders.2.model.quant_conv.weight', 'conditioner.embedders.2.model.quant_conv.bias', 'conditioner.embedders.2.model.post_quant_conv.weight', 'conditioner.embedders.2.model.post_quant_conv.bias', 'loss_fn.g_kernel'] Unexpected Keys: ['betas', 'alphas_cumprod', 'alphas_cumprod_prev', 'sqrt_alphas_cumprod', 'sqrt_one_minus_alphas_cumprod', 'log_one_minus_alphas_cumprod', 'sqrt_recip_alphas_cumprod', 'sqrt_recipm1_alphas_cumprod', 'posterior_variance', 'posterior_log_variance_clipped', 'posterior_mean_coef1', 'posterior_mean_coef2', 'model_ema.decay', 'model_ema.num_updates', 'cond_stage_model.model.positional_embedding', 'cond_stage_model.model.text_projection', 'cond_stage_model.model.logit_scale', 'cond_stage_model.model.transformer.resblocks.0.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.0.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.0.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.0.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.0.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.0.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.0.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.0.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.0.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.0.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.0.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.0.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.1.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.1.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.1.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.1.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.1.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.1.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.1.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.1.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.1.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.1.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.1.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.1.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.2.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.2.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.2.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.2.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.2.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.2.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.2.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.2.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.2.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.2.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.2.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.2.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.3.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.3.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.3.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.3.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.3.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.3.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.3.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.3.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.3.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.3.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.3.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.3.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.4.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.4.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.4.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.4.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.4.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.4.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.4.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.4.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.4.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.4.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.4.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.4.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.5.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.5.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.5.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.5.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.5.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.5.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.5.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.5.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.5.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.5.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.5.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.5.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.6.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.6.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.6.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.6.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.6.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.6.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.6.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.6.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.6.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.6.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.6.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.6.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.7.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.7.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.7.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.7.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.7.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.7.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.7.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.7.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.7.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.7.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.7.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.7.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.8.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.8.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.8.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.8.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.8.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.8.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.8.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.8.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.8.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.8.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.8.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.8.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.9.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.9.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.9.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.9.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.9.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.9.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.9.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.9.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.9.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.9.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.9.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.9.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.10.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.10.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.10.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.10.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.10.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.10.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.10.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.10.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.10.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.10.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.10.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.10.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.11.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.11.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.11.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.11.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.11.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.11.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.11.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.11.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.11.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.11.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.11.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.11.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.12.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.12.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.12.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.12.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.12.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.12.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.12.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.12.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.12.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.12.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.12.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.12.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.13.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.13.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.13.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.13.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.13.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.13.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.13.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.13.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.13.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.13.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.13.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.13.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.14.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.14.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.14.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.14.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.14.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.14.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.14.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.14.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.14.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.14.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.14.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.14.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.15.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.15.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.15.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.15.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.15.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.15.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.15.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.15.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.15.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.15.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.15.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.15.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.16.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.16.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.16.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.16.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.16.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.16.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.16.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.16.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.16.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.16.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.16.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.16.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.17.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.17.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.17.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.17.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.17.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.17.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.17.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.17.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.17.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.17.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.17.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.17.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.18.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.18.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.18.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.18.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.18.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.18.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.18.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.18.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.18.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.18.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.18.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.18.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.19.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.19.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.19.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.19.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.19.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.19.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.19.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.19.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.19.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.19.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.19.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.19.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.20.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.20.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.20.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.20.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.20.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.20.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.20.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.20.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.20.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.20.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.20.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.20.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.21.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.21.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.21.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.21.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.21.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.21.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.21.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.21.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.21.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.21.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.21.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.21.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.22.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.22.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.22.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.22.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.22.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.22.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.22.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.22.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.22.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.22.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.22.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.22.mlp.c_proj.bias', 'cond_stage_model.model.transformer.resblocks.23.ln_1.weight', 'cond_stage_model.model.transformer.resblocks.23.ln_1.bias', 'cond_stage_model.model.transformer.resblocks.23.attn.in_proj_weight', 'cond_stage_model.model.transformer.resblocks.23.attn.in_proj_bias', 'cond_stage_model.model.transformer.resblocks.23.attn.out_proj.weight', 'cond_stage_model.model.transformer.resblocks.23.attn.out_proj.bias', 'cond_stage_model.model.transformer.resblocks.23.ln_2.weight', 'cond_stage_model.model.transformer.resblocks.23.ln_2.bias', 'cond_stage_model.model.transformer.resblocks.23.mlp.c_fc.weight', 'cond_stage_model.model.transformer.resblocks.23.mlp.c_fc.bias', 'cond_stage_model.model.transformer.resblocks.23.mlp.c_proj.weight', 'cond_stage_model.model.transformer.resblocks.23.mlp.c_proj.bias', 'cond_stage_model.model.token_embedding.weight', 'cond_stage_model.model.ln_final.weight', 'cond_stage_model.model.ln_final.bias', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.input_blocks.1.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.input_blocks.2.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.input_blocks.4.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.input_blocks.5.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.input_blocks.7.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.input_blocks.8.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.middle_block.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.middle_block.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.output_blocks.3.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.output_blocks.4.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.output_blocks.5.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.output_blocks.7.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.output_blocks.10.1.transformer_blocks.0.norm2.bias', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_q.weight', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_out.0.weight', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_out.0.bias', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm2.weight', 'model.diffusion_model.output_blocks.11.1.transformer_blocks.0.norm2.bias']
另外想请教一下,训练的时候用了什么GPU,需要训练多少时间。感谢!
any solution to this pretrained model problem? Or I should train a model by my own?
This is just a warning that can be ignored when you try to fine-tune the pretrained model.
在训练一开始加载512-inpainting-ema.ckpt时,发现pertrained weight和model很多权重没有成功加载。请问这个是正常的吗?
另外想请教一下,训练的时候用了什么GPU,需要训练多少时间。感谢!