Huggingface中发布的Tencent-Hunyuan/HunyuanDiT-v1.2 模型结构与源码中的模型结构不匹配

wpdong0727 commented 3 months ago

hunyuandit和hunyuandit_v1.1版本的模型extra_embedder的结构与源码是匹配的:

self.extra_in_dim = 256 * 6 + hidden_size
self.x_embedder = PatchEmbed(input_size, patch_size, in_channels, hidden_size)
self.t_embedder = TimestepEmbedder(hidden_size)
self.extra_in_dim += 1024
self.extra_embedder = nn.Sequential(
    nn.Linear(self.extra_in_dim, hidden_size * 4),
    FP32_SiLU(),
    nn.Linear(hidden_size * 4, hidden_size, bias=True),
)

但是hunyuandit_v1.2版本的模型extra_embedder的结构与源码不匹配的, extra_in_dim=1024。从而导致模型加载报错: Error(s) in loading state_dict for ModifiedHunYuanDiT:\n\tMissing key(s) in state_dict: "style_embedder.weight". \n\tsize mismatch for extra_embedder.0.weight: copying a param with shape torch.Size([5632, 1024]) from checkpoint, the shape in current model is torch.Size([5632, 3968]).

cugzhengzhimin commented 3 months ago

怀疑他们压根不检查一样

zml-ai commented 3 months ago

Hi, the training and inference code for version 1.2, including Lora and ControlNet, will be released soon. Currently, the v1.2 weights on Hugging Face are only for Kohya’s loading. Please primarily refer to the updates on our GitHub.

benzhangdragonplus commented 2 months ago

怀疑他们压根不检查一样

我也觉得

Tencent / HunyuanDiT

Huggingface中发布的Tencent-Hunyuan/HunyuanDiT-v1.2 模型结构与源码中的模型结构不匹配 #138