[BUG/Help] <title>复现ptuning微调时出现RuntimeError: "bernoulli_scalar_cpu_" not implemented for 'Half'

ysqfirmament commented 3 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

进行微调的时候，尝试复现ADGEN数据集任务，在运行bash train.sh过程中出现此错误

执行

import torch
print(torch.cuda.is_available())

得到的结果为True

C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\transformers\optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
inputs ▒▒▒▒#▒▒*▒▒▒▒#▒▒▒▒*▒▒▒#▒Ը▒*ͼ▒▒#▒▒▒▒*▒▒▒▒#▒▒▒ȿ▒ ▒▒▒ɵ▒▒▒▒ȿ▒▒▒▒▒▒▒▒▒▒▒▒▒۲▒▒▒,▒▒▒▒ʱ▒д▒▒˵▒▒▒ͷ▒▒▒▒▒Ͼ▒▒ô▒ʱ▒▒,˭▒▒▒ܴ▒▒▒▒ȳ▒2▒׵▒Ч▒▒▒▒▒ɵĿ▒▒▒,▒▒Ȼ▒▒▒▒▒▒С▒▒▒ְ▒▒▒▒▒▒▒▒▒▒▒▒▒Ȼ▒▒▒▒▒▒,▒▒▒▒▒׷▒▒▒▒▒▒▒▒▒▒▒▒▒а▒▒▒▒ա▒ϵ▒▒▒▒▒▒▒▒▒▒▒▒ƿ▒▒▒,▒▒
label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
labels <image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100> ▒▒▒ɵ▒▒▒▒ȿ▒▒▒▒▒▒▒▒▒▒▒▒▒۲▒▒▒,▒▒▒▒ʱ▒д▒▒˵▒▒▒ͷ▒▒▒▒▒Ͼ▒▒ô▒ʱ▒▒,˭▒▒▒ܴ▒▒▒▒ȳ▒2▒׵▒Ч▒▒▒▒▒ɵĿ▒▒▒,▒▒Ȼ▒▒▒▒▒▒С▒▒▒ְ▒▒▒▒▒▒▒▒▒▒▒▒▒Ȼ▒▒▒▒▒▒,▒▒▒▒▒׷▒▒▒▒▒▒▒▒▒▒▒▒▒а▒▒▒▒ա▒ϵ▒▒▒▒▒▒▒▒▒▒▒▒ƿ▒▒▒,▒▒<image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100>
  0%|          | 0/3000 [00:00<?, ?it/s]03/23/2024 23:23:53 - WARNING - transformers_modules.chatglm-6b-int4.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Traceback (most recent call last):
  File "D:\GLM\ChatGLM-6B-main\ptuning\main.py", line 430, in <module>
    main()
  File "D:\GLM\ChatGLM-6B-main\ptuning\main.py", line 369, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "D:\GLM\ChatGLM-6B-main\ptuning\trainer.py", line 1635, in train
    return inner_training_loop(
  File "D:\GLM\ChatGLM-6B-main\ptuning\trainer.py", line 1904, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "D:\GLM\ChatGLM-6B-main\ptuning\trainer.py", line 2647, in training_step
    loss = self.compute_loss(model, inputs)
  File "D:\GLM\ChatGLM-6B-main\ptuning\trainer.py", line 2679, in compute_loss
    outputs = model(**inputs)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\firmament/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 1190, in forward
    transformer_outputs = self.transformer(
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\firmament/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 930, in forward
    past_key_values = self.get_prompt(batch_size=input_ids.shape[0], device=input_ids.device,
  File "C:\Users\firmament/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 878, in get_prompt
    past_key_values = self.dropout(past_key_values)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\dropout.py", line 58, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\functional.py", line 1266, in dropout
    return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
RuntimeError: "bernoulli_scalar_cpu_" not implemented for 'Half'
  0%|          | 0/3000 [00:00<?, ?it/s]

Expected Behavior

No response

Steps To Reproduce

将ADGEN数据集文件夹放入ptuning文件夹在ptuning文件夹运行bash trains.sh 出现错误

Environment

- OS: windows11
- Python:3.10
- Transformers: 4.27.1
- PyTorch: 2.2.1+cu121
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True

Anything else?

No response

ysqfirmament commented 3 months ago

是不是我的电脑跑不动？

Zylsjsp commented 1 month ago

是不是我的电脑跑不动？

我觉得你应该先讲一下你显卡的型号显存同时查一下自己的显卡是不是支持模型量化（我记得在根目录的readme有提示）

默认配置是量化到int4的显存需求很低而且你提示也不是oom 应该可以排除爆显存的可能（至少这一步报错的时候还不是）

我有个建议是你去把量化的参数改成fp16的（直接删掉也行）不量化模型只是显存占用大些速度能快好多一是因为加载过程不用量化二是fp16训练推理最快（我的测试中训练时间fp16<<int4<int8）

顺便一提我的配置是4张tesla t4 16g显存能跑所有p-tuning但是全量微调会爆显存软件版本是

- Python:3.9.19
- Transformers: 4.27.1
- PyTorch: 1.3.1+cu116
- CUDA: 11.6

因为服务器没办法更新另一个微调的环境需要transformers>=4.30 我还花了很久解决依赖问题~~依赖地狱~~ 所以对依赖版本印象特别深

~实在不行你可以试试和我的配置保持一致管他那么多先跑通再说~

~顺便我是Linux跑的要不你也试试找个服务器~

Zylsjsp commented 1 month ago

看看你用的代码是不是最新的这个报错应该是说有个标量不能用半精度实现如果最新的代码还是报同样的错误你可以试试把报错的代码中half()这种半精度量化的过程修改去除如果你修改了代码需要的显存大概会提升而且量化到int的操作可能也会跟着变化所以不推荐改你理解做什么的代码也不推荐修改代码之后再进行int量化了 #462

THUDM / ChatGLM-6B