Tencent / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
https://dit.hunyuan.tencent.com/
Other
3.33k stars 284 forks source link

Huggingface中发布的Tencent-Hunyuan/HunyuanDiT-v1.2 模型结构与源码中的模型结构不匹配 #138

Closed wpdong0727 closed 3 months ago

wpdong0727 commented 3 months ago

hunyuandit和hunyuandit_v1.1版本的模型extra_embedder的结构与源码是匹配的:

self.extra_in_dim = 256 * 6 + hidden_size
self.x_embedder = PatchEmbed(input_size, patch_size, in_channels, hidden_size)
self.t_embedder = TimestepEmbedder(hidden_size)
self.extra_in_dim += 1024
self.extra_embedder = nn.Sequential(
    nn.Linear(self.extra_in_dim, hidden_size * 4),
    FP32_SiLU(),
    nn.Linear(hidden_size * 4, hidden_size, bias=True),
)

但是hunyuandit_v1.2版本的模型extra_embedder的结构与源码不匹配的, extra_in_dim=1024。 从而导致模型加载报错: Error(s) in loading state_dict for ModifiedHunYuanDiT:\n\tMissing key(s) in state_dict: "style_embedder.weight". \n\tsize mismatch for extra_embedder.0.weight: copying a param with shape torch.Size([5632, 1024]) from checkpoint, the shape in current model is torch.Size([5632, 3968]).

cugzhengzhimin commented 3 months ago

怀疑他们压根不检查一样

zml-ai commented 3 months ago

Hi, the training and inference code for version 1.2, including Lora and ControlNet, will be released soon. Currently, the v1.2 weights on Hugging Face are only for Kohya’s loading. Please primarily refer to the updates on our GitHub.

benzhangdragonplus commented 2 months ago

怀疑他们压根不检查一样

我也觉得