[BUG/Help] <多卡报错ValueError: 130004 is not in list>

YSLLYW commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

已经更新了chatglm的最新文件，但是在多卡训练时报错：ValueError: 130004 is not in list，使用的p-tuning训练方式，怎么解决呀？

Expected Behavior

No response

Steps To Reproduce

ValueError: Caught ValueError in replica 1 on device 1. Original Traceback (most recent call last): File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker output = module(*input, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6B-model/modeling_chatglm.py", line 1190, in forward transformer_outputs = self.transformer( File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6B-model/modeling_chatglm.py", line 936, in forward attention_mask = self.get_masks( File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6B-model/modeling_chatglm.py", line 682, in get_masks context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids] File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6B-model/modeling_chatglm.py", line 682, in context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids] ValueError: 130004 is not in list

Environment

OS: Ubuntu 20.04
Python: 3.8
Transformers: 4.28.0
PyTorch: 2.0.0
CUDA Support: True
2*A5000

Anything else?

No response

suparek commented 1 year ago

推理时同样报了这个错误，同求帮助

File "/data/venv3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/data/cc/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py", line 936, in forward attention_mask = self.get_masks( File "/data/cc/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py", line 682, in get_masks context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids] File "/data/cc/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py", line 682, in context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids] ValueError: 130004 is not in list

254288008 commented 1 year ago

大神解决了吗

THUDM / ChatGLM-6B