wadang commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

ize mismatch for transformer.layers.0.attention.query_key_value.weight: copying a param with shape torch.Size([12288, 2048]) from checkpoint, the shape in current model is torch.Size([12288, 4096]). size mismatch for transformer.layers.0.attention.dense.weight: copying a param with shape torch.Size([4096, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for transformer.layers.0.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 2048]) from checkpoint, the shape in current model is torch.Size([16384, 4096]). size mismatch for transformer.layers.0.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 8192]) from checkpoint, the shape in current model is torch.Size([4096, 16384]). size mismatch for transformer.layers.1.attention.query_key_value.weight: copying a param with shape torch.Size([12288, 2048]) from checkpoint, the shape in current model is torch.Size([12288, 4096]). size mismatch for transformer.layers.1.attention.dense.weight: copying a param with shape torch.Size([4096, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for transformer.layers.1.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 2048]) from checkpoint, the shape in current model is torch.Size([16384, 4096]). size mismatch for transformer.layers.1.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 8192]) from checkpoint, the shape in current model is torch.Size([4096, 16384]).

Expected Behavior

No response

Steps To Reproduce

clone 了 chatglm-6b-int4 项目
修改代码 modeling_chatglm.py # from .quantization import quantize, QuantizedEmbedding, QuantizedLinear, load_cpu_kernel # self.transformer = quantize(self.transformer, bits, use_quantization_cache=use_quantization_cache, empty_init=empty_init, **kwargs)
修改代码 cli_demo.py `tokenizer = AutoTokenizer.from_pretrained("../chatglm-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("../chatglm-6b-int4", trust_remote_code=True).float()

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)

model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()`

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

seastarmanager commented 1 year ago

我也遇到同样的问题 Windows11 Python3.9 torch2.0.0 or torch1.13.1 都有问题 Transformers (compiled from latest git source) CPU：AMD R7-6800H

songxxzp commented 1 year ago

chatglm-6b-int4是量化后的权重，必须使用量化后的模型结构加载

#    from .quantization import quantize, QuantizedEmbedding, QuantizedLinear, load_cpu_kernel
#   self.transformer = quantize(self.transformer, bits, use_quantization_cache=use_quantization_cache, empty_init=empty_init, **kwargs)

注释掉这段代码之后，模型结构依然是量化之前的，因此会发生加载错误。可以尝试加载chatglm-6b的权重

THUDM / ChatGLM-6B

[BUG/Help] <在mac上运行chatglm-6b-int4 报错：copying a param with shape torch.Size([12288, 2048]) from checkpoint, the shape in current model is torch.Size([12288, 4096])> #168