THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
40.74k stars 5.23k forks source link

chatglm-6B-int4 如何在 Mac M2 上微调 #977

Open ryzn0518 opened 1 year ago

ryzn0518 commented 1 year ago

推理过程遇到的问题

  1. 加载模型的时候,使用 float() 可以正常加载,但是如果使用 to("mps") 也会报错
    
    model = AutoModel.from_pretrained("chatglm-6b-int4", trust_remote_code=True).float()

File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 392, in forward output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, **kwargs) # type: ignore[misc] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 57, in forward weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 275, in extract_weight_to_half func = kernels.int4WeightExtractionHalf ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'


2. 可以在 m2 基于原始版本进行推理测试,但是真正执行 微调的时候,由于没有办法使用 gpu,想体验 torch.device("mps"),看起来无法执行?

3. 已经做过尝试的步骤,仍然无效;
![image](https://github.com/THUDM/ChatGLM-6B/assets/19700467/a3bac13b-2933-4b3c-b5e8-bf7875c2307f)

### 微调过程中遇到的问题

1. 下载 chatglm-6B-int4 的模型,基于此进行微调;
2. 运行环境:

Apple Mac M2 机器 torch==2.1.0.dev20230507

5. 执行 bash train.sh 报错

Traceback (most recent call last): File "/Users/diaojunxian/Documents/agi/ChatGLM-6B/ptuning/main.py", line 433, in main() File "/Users/diaojunxian/Documents/agi/ChatGLM-6B/ptuning/main.py", line 372, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/Documents/agi/ChatGLM-6B/ptuning/trainer.py", line 1635, in train return inner_training_loop( ^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/Documents/agi/ChatGLM-6B/ptuning/trainer.py", line 1904, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/Documents/agi/ChatGLM-6B/ptuning/trainer.py", line 2665, in training_step loss.backward() File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/_tensor.py", line 488, in backward torch.autograd.backward( File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/autograd/init.py", line 204, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/autograd/function.py", line 274, in apply return user_fn(self, *args) ^^^^^^^^^^^^^^^^^^^^ File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 226, in backward torch.autograd.backward(outputs_with_grad, args_with_grad) File "/Users/diaojunxian/anaconda3/envs/py3.11/lib/python3.11/site-packages/torch/autograd/init.py", line 204, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: function W8A16LinearCPUBackward returned an incorrect number of gradients (expected 5, got 4)

mrzrx commented 1 year ago

RuntimeError: function W8A16LinearCPUBackward returned an incorrect number of gradients (expected 5, got 4),这个错误通过修改quantization.py line93为return grad_input.view(ctx.inp_shape), grad_weight.view(ctx.weight_shape), None, None, None 可以解决

ryzn0518 commented 1 year ago

是的,这个可以解决,能够实现通过 cpu 微调,但是如果我在 quantizaiton.py 中这么改动,微调的时候,就报错了。

-            if self.device == torch.device("cpu"):
+            if self.device == torch.device("mps"):
 File "/Users/diaojunxian/anaconda3/envs/3.11/lib/python3.11/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/diaojunxian/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 56, in forward
    weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/diaojunxian/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 274, in extract_weight_to_half
    func = kernels.int4WeightExtractionHalf
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'
ryzn0518 commented 1 year ago

@duzx16 单纯用 cpu P-tuning 训练 chatglm-6B-int4 出来的模型,在加载以后报错了。

Traceback (most recent call last):
  File "/Users/diaojunxian/Library/Application Support/JetBrains/PyCharmCE2023.1/scratches/scratch.py", line 21, in <module>
    model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/diaojunxian/anaconda3/envs/3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1630, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'ChatGLMModel' object has no attribute 'prefix_encoder'

已经特别检查加载方式,所以还有什么可能会导致报错?

  1. tokenizer = AutoTokenizer.from_pretrained("chatglm-6b-int4", trust_remote_code=True, pre_seq_len=128)

  2. transformers 的版本 transformers==4.28.1

已经解决:

增加 config 引入 config = AutoConfig.from_pretrained("chatglm-6b-int4", trust_remote_code=True, pre_seq_len=128)

xiaoToby commented 1 year ago

我是进行ptuning微调过程中出现相同问题,经过以下改动。 改动:

  1. config = AutoConfig.from_pretrained("chatglm-6b-int4", trust_remote_code=True, pre_seq_len=128)
  2. return grad_input.view(ctx.inp_shape), grad_weight.view(ctx.weight_shape), None, None, None

还是报同样的问题: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: function W8A16LinearCPUBackward returned an incorrect number of gradients (expected 5, got 4)

我是windos10系统 @mrzrx @diaojunxian

qianxianyang commented 1 year ago

你好,这个问题AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf',是通过什么方法进行解决的呢?谢谢

ryzn0518 commented 1 year ago

@qianxianyang 其实我没有解决,我有遇到是因为我强制指定了 model.to("mps"),导致走了非 device=cpu 的分支,如果你本地的环境,没有 cuda,你可以指定 device == cpu

还有,kernel 如果为None,会遇到 NoneType' object has no attribute 'int4WeightExtractionHalf 这个问题,应该还是走了 device=cuda 的走个条件,建议检查一下代码。

xiaoToby commented 1 year ago

你好,这个问题AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf',是通过什么方法进行解决的呢?谢谢

其实就按楼主的办法就解决问题了,主要是你要找到你的kernel路径

xgsong commented 1 year ago

请问如果使用M2 Max 96GB的配置,是否可以直接用FP16的精度,在Pytorch mps模式上面进行ptuning?

ryzn0518 commented 1 year ago

请问如果使用M2 Max 96GB的配置,是否可以直接用FP16的精度,在Pytorch mps模式上面进行ptuning?

我其实没有 run 起来,用 mps 跑的时候,会有错误。后来我用 cpu 验证了一下过程。