Stability-AI / StableCascade

Official Code for Stable Cascade
MIT License
6.44k stars 518 forks source link

mismatch #79

Closed dushwe closed 4 months ago

dushwe commented 4 months ago

when train train_c_lora.py

"/codes/StableCascade/core/init.py", line 326, in call models = self.setup_models(extras) File "codes/StableCascade/train_c_lora.py", line 204, in setup_models text_model = CLIPTextModelWithProjection.from_pretrained(self.config.clip_text_model_name,).requiresgrad(False).to(dtype).to(self.device) File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2881, in from_pretrained ) = cls._load_pretrained_model( File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3278, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for CLIPTextModelWithProjection: size mismatch for text_projection.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([512, 1280]).

BoxFishLab commented 4 months ago

The same dimension mismatch problem: torch._dynamo.exc.TorchRuntimeError: Failed running call_module L__self___clip_txt_pooled_mapper(*(FakeTensor(..., device='cuda:0', size=(2, 1, 512), dtype=torch.bfloat16),), **{}): a and b must have same reduction dim, but got [2, 512] X [1280, 6144].

from user code: File "/mnt/Agumon/sdf/workspace/wliu/jojo/ai-aigc/branch/StableCascade/modules/stage_c.py", line 238, in forward clip = self.gen_c_embeddings(clip_text, clip_text_pooled, clip_img) File "/mnt/Agumon/sdf/workspace/wliu/jojo/ai-aigc/branch/StableCascade/modules/stage_c.py", line 161, in gen_c_embeddings clip_txt_pool = self.clip_txt_pooled_mapper(clip_txt_pooled).view(clip_txt_pooled.size(0), clip_txt_pooled.size(1) * self.c_clip_seq, -1)

dushwe commented 4 months ago

"/codes/StableCascade/core/init.py", line 326, in call models = self.setup_models(extras) File "codes/StableCascade/train_c_lora.py", line 204, in setup_models text_model = CLIPTextModelWithProjection.from_pretrained(self.config.clip_text_model_name,).requiresgrad(False).to(dtype).to(self.device) File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2881, in from_pretrained ) = cls._load_pretrained_model( File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3278, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for CLIPTextModelWithProjection: size mismatch for text_projection.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([512, 1280]).

set projection_dim=1280 when load text_model text_model = CLIPTextModelWithProjection.from_pretrained(self.config.clip_text_model_name,projection_dim=1280).requiresgrad(False).to(dtype).to(self.device)

This problem can be avoided