CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
MIT License
4.45k stars 470 forks source link

PPO Summarizaion issue #311

Closed l1f14bscs0388 closed 1 year ago

l1f14bscs0388 commented 1 year ago

🐛 Describe the bug

0%| | 0/10000 [00:00<?, ?it/s]╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /home/ubuntu/trlx/examples/summarize_rlhf/trlx_gptj_text_summarization.py:12 │ │ 6 in │ │ │ │ 123 │ for i in range(len(val_prompts)): │ │ 124 │ │ post_summary_dict[val_prompts[i]] = val_summaries[i] │ │ 125 │ │ │ ❱ 126 │ trainer = trlx.train( │ │ 127 │ │ reward_fn=reward_fn, │ │ 128 │ │ prompts=train_prompts, │ │ 129 │ │ eval_prompts=val_prompts[0:253], # sampling 1000 validation p │ │ │ │ /home/ubuntu/trlx/trlx/trlx.py:119 in train │ │ │ │ 116 │ eval_pipeline = get_pipeline(config.train.pipeline)(eval_prompts, │ │ 117 │ trainer.add_eval_pipeline(eval_pipeline) │ │ 118 │ │ │ ❱ 119 │ trainer.learn() │ │ 120 │ return trainer │ │ 121 │ │ │ │ /home/ubuntu/trlx/trlx/trainer/accelerate_base_trainer.py:479 in learn │ │ │ │ 476 │ │ │ │ │ # multiple gradient updates on the same batch of d │ │ 477 │ │ │ │ │ # https://arxiv.org/pdf/1707.06347.pdf │ │ 478 │ │ │ │ │ forward_time = time() │ │ ❱ 479 │ │ │ │ │ loss, stats = self.loss(batch) │ │ 480 │ │ │ │ │ forward_time = time() - forward_time │ │ 481 │ │ │ │ │ backward_time = time() │ │ 482 │ │ │ │ │ self.accelerator.backward(loss) │ │ │ │ /home/ubuntu/trlx/trlx/trainer/accelerate_ppo_trainer.py:181 in loss │ │ │ │ 178 │ │ │ │ attention_mask[:, start:end], │ │ 179 │ │ │ ) │ │ 180 │ │ │ │ ❱ 181 │ │ loss, stats = self.config.method.loss( │ │ 182 │ │ │ logprobs=logprobs, │ │ 183 │ │ │ values=values_pred, │ │ 184 │ │ │ old_logprobs=old_logprobs, │ │ │ │ /home/ubuntu/trlx/trlx/trainer/nn/ppo_models.py:220 in loss │ │ │ │ 217 │ │ │ │ value_loss=vf_loss.item(), │ │ 218 │ │ │ ), │ │ 219 │ │ │ values=dict( │ │ ❱ 220 │ │ │ │ get_tensor_stats(values, mask, n), │ │ 221 │ │ │ │ values_error=torch.sum(((values - returns) * mask) * │ │ 222 │ │ │ │ clipfrac=vf_clipfrac, │ │ 223 │ │ │ ), │ │ │ │ /home/ubuntu/trlx/trlx/utils/modeling.py:242 in get_tensor_stats │ │ │ │ 239 │ mean = (xs mask).sum() / n │ │ 240 │ return dict( │ │ 241 │ │ mean=mean, │ │ ❱ 242 │ │ min=torch.where(mask.bool(), xs, np.inf).min(), │ │ 243 │ │ max=torch.where(mask.bool(), xs, -np.inf).max(), │ │ 244 │ │ std=torch.sqrt(((xs - mean) * mask).pow(2).sum() / n), │ │ 245 │ ) │ ╰──────────────────────────────────────────────────────────────────────────────╯ RuntimeError: expected scalar type c10::Half but found double 0%| | 0/10000 [00:00<?, ?it/s] [13:32:57] ERROR failed (exitcode: 1) local_rank: 0 (pid: 74519) ]8;id=862096;file:///home/ubuntu/.local/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py\api.py]8;;\:]8;id=300854;file:///home/ubuntu/.local/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py#684\684]8;;\ of binary: /usr/bin/python3

Which trlX version are you using?

No response

Additional system and package information

No response

PPO_config

train: seq_length: 550 epochs: 50 total_steps: 10000 batch_size: 1

checkpoint_interval: 1000 eval_interval: 200

pipeline: "PromptPipeline" trainer: "AcceleratePPOTrainer"

model: model_path: "amirasghar/SFT_GPT_J"

tokenizer: tokenizer_path: "gpt2" truncation_side: "right"

optimizer: name: "adamw" kwargs: lr: 5.0e-6 betas: [0.9, 0.999] eps: 1.0e-8 weight_decay: 0.01

scheduler: name: "cosine_annealing" kwargs: T_max: 100000 eta_min: 5.0e-6

method: name: "ppoconfig" num_rollouts: 128 chunk_size: 16 ppo_epochs: 4 init_kl_coef: 0.1 target: 6 horizon: 10000 gamma: 1 lam: 0.95 cliprange: 0.2 cliprange_value: 0.2 vf_coef: 0.2 scale_reward: False ref_mean: null ref_std: null cliprange_reward: 10 gen_kwargs: max_new_tokens: 320

default_accelerate

command_file: null commands: null compute_environment: LOCAL_MACHINE deepspeed_config: deepspeed_config_file: examples/summarize_rlhf/configs/ds_config_trlx_gptj_summarize.json zero3_init_flag: false distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} gpu_ids: null machine_rank: 0 main_process_ip: null main_process_port: null main_training_function: main megatron_lm_config: {} num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_name: null tpu_zone: null use_cpu: false

maxreciprocate commented 1 year ago

Hey, do you use the latest pytorch version 1.13.1?

wangruo91 commented 1 year ago

Hi, I also encountered the same bug, and after I update pytorch version to 1.13.1, a new bug occurs.

Traceback (most recent call last):

File "trlx_gptj_text_summarization.py", line 119, in trainer = trlx.train( File "/workspace/bin/trlx/trlx.py", line 97, in train trainer.make_experience(config.method.num_rollouts) File "/workspace/bin/trlx/trainer/accelerate_ppo_trainer.py", line 284, in make_experience samples = self.generate(batch) File "/workspace/bin/trlx/trainer/accelerate_base_trainer.py", line 230, in generate return self.accelerator.unwrap_model(self.model).generate( File "/workspace/bin/trlx/trainer/nn/ppo_models.py", line 357, in generate return self.base_model.generate(input_ids, x) File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 1518, in generate return self.greedy_search( File "/root/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 2285, in greedy_search outputs = self( File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/gptj/modeling_gptj.py", line 821, in forward transformer_outputs = self.transformer( File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/gptj/modeling_gptj.py", line 676, in forward outputs = block( File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/gptj/modeling_gptj.py", line 310, in forward attn_outputs = self.attn( File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/gptj/modeling_gptj.py", line 211, in forward query = self.q_proj(hidden_states) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

l1f14bscs0388 commented 1 year ago

@wangruo91 use pytroch version==1.12.1

wangruo91 commented 1 year ago

@wangruo91 use pytroch version==1.12.1

Thanks, I change to version==1.12.1 and it works.

PhungVanDuy commented 1 year ago

@l1f14bscs0388 @wangruo91 did you manage to resolve this issue?