beyondguo / LLM-Tuning

Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.
956 stars 98 forks source link

按照教程,一步一步弄的,到了训练PPO的时候, 卡到 CUDA error: device-side assert triggered #54

Open karl-tao-zhang opened 1 year ago

karl-tao-zhang commented 1 year ago

Using pad_token, but it is not set yet. Loading base model for ppo training... 加载base 加载lora 加载ppo WARNING:root:A <class 'peft.peft_model.PeftModelForCausalLM'> model is loaded from '/root/autodl-tmp/LLM/weights/sft_lora', and no v_head weight is found. This IS expected if you are not resuming PPO training. Loading base model for reward model... The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. Some weights of BaichuanForSequenceClassification were not initialized from the model checkpoint at baichuan-inc/baichuan-7B and are newly initialized: ['score.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 开始训练 0it [00:00, ?it/s]--------------------- CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

0 0it [00:10, ?it/s] Traceback (most recent call last): File "rl_training.py", line 331, in response_tensors = ppo_trainer.generate( File "/root/miniconda3/lib/python3.8/site-packages/trl/trainer/ppo_trainer.py", line 446, in generate return self._generate_batched( File "/root/miniconda3/lib/python3.8/site-packages/trl/trainer/ppo_trainer.py", line 503, in _generate_batched generations = self.accelerator.unwrap_model(self.model).generate(padded_inputs, generation_kwargs) File "/root/miniconda3/lib/python3.8/site-packages/trl/models/modeling_value_head.py", line 198, in generate return self.pretrained_model.generate(args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/peft/peft_model.py", line 975, in generate outputs = self.base_model.generate(kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, *kwargs) File "/root/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 1648, in generate return self.sample( File "/root/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 2730, in sample outputs = self( File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward return module._hf_hook.post_forward(module, output) File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 305, in post_forward output = send_to_device(output, self.input_device, skip_keys=self.skip_keys) File "/root/miniconda3/lib/python3.8/site-packages/accelerate/utils/operations.py", line 160, in send_to_device { File "/root/miniconda3/lib/python3.8/site-packages/accelerate/utils/operations.py", line 161, in k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys) File "/root/miniconda3/lib/python3.8/site-packages/accelerate/utils/operations.py", line 151, in send_to_device return honor_type( File "/root/miniconda3/lib/python3.8/site-packages/accelerate/utils/operations.py", line 83, in honor_type return type(obj)(generator) File "/root/miniconda3/lib/python3.8/site-packages/accelerate/utils/operations.py", line 152, in tensor, (send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys) for t in tensor) File "/root/miniconda3/lib/python3.8/site-packages/accelerate/utils/operations.py", line 151, in send_to_device return honor_type( File "/root/miniconda3/lib/python3.8/site-packages/accelerate/utils/operations.py", line 83, in honor_type return type(obj)(generator) File "/root/miniconda3/lib/python3.8/site-packages/accelerate/utils/operations.py", line 152, in tensor, (send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys) for t in tensor) File "/root/miniconda3/lib/python3.8/site-packages/accelerate/utils/operations.py", line 167, in send_to_device return tensor.to(device, non_blocking=non_blocking) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "rl_training.py", line 364, in print(question_tensors) File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 426, in repr return torch._tensor_str._str(self, tensor_contents=tensor_contents) File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py", line 636, in _str return _str_intern(self, tensor_contents=tensor_contents) File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py", line 567, in _str_intern tensor_str = _tensor_str(self, indent) File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py", line 327, in _tensor_str formatter = _Formatter(get_summarized_data(self) if summarize else self) File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor_str.py", line 111, in init value_str = "{}".format(value) File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 872, in format return self.item().format(format_spec) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

karl-tao-zhang commented 1 year ago

CUDA_VISIBLE_DEVICES=0,1,2,3 python rl_training.py \ --base_model_name baichuan-inc/baichuan-7B \ --merged_sft_model_path /root/autodl-tmp/LLM/weights/sft_lora \ --sft_model_lora_path /root/autodl-tmp/LLM/weights/sft_lora \ --reward_model_lora_path /root/autodl-tmp/LLM/weights/rm_lora \ --adafactor False \ --save_freq 10 \ --output_max_length 256 \ --batch_size 2 \ --gradient_accumulation_steps 2 \ --batched_gen True \ --ppo_epochs 4 \ --seed 0 \ --learning_rate 1e-5 \ --early_stopping True \ --output_dir /root/autodl-tmp/LLM/weights/ppo_lora \

karl-tao-zhang commented 1 year ago

4张3090 显存不够换到了 4张A40, 出现上述错误, 出现错误后, 我去 trl的issues找了找相关的代码, 说是要这么解决吗? tokenizer.eos_token_id = model.config.eos_token_id tokenizer.pad_token = tokenizer.eos_token

karl-tao-zhang commented 1 year ago

1张卡才行