Closed wadang closed 1 year ago
我也遇到同样的问题 Windows11 Python3.9 torch2.0.0 or torch1.13.1 都有问题 Transformers (compiled from latest git source) CPU:AMD R7-6800H
chatglm-6b-int4是量化后的权重,必须使用量化后的模型结构加载
# from .quantization import quantize, QuantizedEmbedding, QuantizedLinear, load_cpu_kernel
# self.transformer = quantize(self.transformer, bits, use_quantization_cache=use_quantization_cache, empty_init=empty_init, **kwargs)
注释掉这段代码之后,模型结构依然是量化之前的,因此会发生加载错误。可以尝试加载chatglm-6b的权重
Is there an existing issue for this?
Current Behavior
ize mismatch for transformer.layers.0.attention.query_key_value.weight: copying a param with shape torch.Size([12288, 2048]) from checkpoint, the shape in current model is torch.Size([12288, 4096]). size mismatch for transformer.layers.0.attention.dense.weight: copying a param with shape torch.Size([4096, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for transformer.layers.0.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 2048]) from checkpoint, the shape in current model is torch.Size([16384, 4096]). size mismatch for transformer.layers.0.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 8192]) from checkpoint, the shape in current model is torch.Size([4096, 16384]). size mismatch for transformer.layers.1.attention.query_key_value.weight: copying a param with shape torch.Size([12288, 2048]) from checkpoint, the shape in current model is torch.Size([12288, 4096]). size mismatch for transformer.layers.1.attention.dense.weight: copying a param with shape torch.Size([4096, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for transformer.layers.1.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 2048]) from checkpoint, the shape in current model is torch.Size([16384, 4096]). size mismatch for transformer.layers.1.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 8192]) from checkpoint, the shape in current model is torch.Size([4096, 16384]).
Expected Behavior
No response
Steps To Reproduce
# from .quantization import quantize, QuantizedEmbedding, QuantizedLinear, load_cpu_kernel
# self.transformer = quantize(self.transformer, bits, use_quantization_cache=use_quantization_cache, empty_init=empty_init, **kwargs)
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()`
Environment
Anything else?
No response