Closed l1f14bscs0388 closed 1 year ago
Hey, do you use the latest pytorch version 1.13.1
?
Hi, I also encountered the same bug, and after I update pytorch version to 1.13.1, a new bug occurs.
File "trlx_gptj_text_summarization.py", line 119, in cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
@wangruo91 use pytroch version==1.12.1
@wangruo91 use pytroch version==1.12.1
Thanks, I change to version==1.12.1 and it works.
@l1f14bscs0388 @wangruo91 did you manage to resolve this issue?
🐛 Describe the bug
0%| | 0/10000 [00:00<?, ?it/s]╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /home/ubuntu/trlx/examples/summarize_rlhf/trlx_gptj_text_summarization.py:12 │ │ 6 in │
│ │
│ 123 │ for i in range(len(val_prompts)): │
│ 124 │ │ post_summary_dict[val_prompts[i]] = val_summaries[i] │
│ 125 │ │
│ ❱ 126 │ trainer = trlx.train( │
│ 127 │ │ reward_fn=reward_fn, │
│ 128 │ │ prompts=train_prompts, │
│ 129 │ │ eval_prompts=val_prompts[0:253], # sampling 1000 validation p │
│ │
│ /home/ubuntu/trlx/trlx/trlx.py:119 in train │
│ │
│ 116 │ eval_pipeline = get_pipeline(config.train.pipeline)(eval_prompts, │
│ 117 │ trainer.add_eval_pipeline(eval_pipeline) │
│ 118 │ │
│ ❱ 119 │ trainer.learn() │
│ 120 │ return trainer │
│ 121 │
│ │
│ /home/ubuntu/trlx/trlx/trainer/accelerate_base_trainer.py:479 in learn │
│ │
│ 476 │ │ │ │ │ # multiple gradient updates on the same batch of d │
│ 477 │ │ │ │ │ # https://arxiv.org/pdf/1707.06347.pdf │
│ 478 │ │ │ │ │ forward_time = time() │
│ ❱ 479 │ │ │ │ │ loss, stats = self.loss(batch) │
│ 480 │ │ │ │ │ forward_time = time() - forward_time │
│ 481 │ │ │ │ │ backward_time = time() │
│ 482 │ │ │ │ │ self.accelerator.backward(loss) │
│ │
│ /home/ubuntu/trlx/trlx/trainer/accelerate_ppo_trainer.py:181 in loss │
│ │
│ 178 │ │ │ │ attention_mask[:, start:end], │
│ 179 │ │ │ ) │
│ 180 │ │ │
│ ❱ 181 │ │ loss, stats = self.config.method.loss( │
│ 182 │ │ │ logprobs=logprobs, │
│ 183 │ │ │ values=values_pred, │
│ 184 │ │ │ old_logprobs=old_logprobs, │
│ │
│ /home/ubuntu/trlx/trlx/trainer/nn/ppo_models.py:220 in loss │
│ │
│ 217 │ │ │ │ value_loss=vf_loss.item(), │
│ 218 │ │ │ ), │
│ 219 │ │ │ values=dict( │
│ ❱ 220 │ │ │ │ get_tensor_stats(values, mask, n), │
│ 221 │ │ │ │ values_error=torch.sum(((values - returns) * mask) * │
│ 222 │ │ │ │ clipfrac=vf_clipfrac, │
│ 223 │ │ │ ), │
│ │
│ /home/ubuntu/trlx/trlx/utils/modeling.py:242 in get_tensor_stats │
│ │
│ 239 │ mean = (xs mask).sum() / n │
│ 240 │ return dict( │
│ 241 │ │ mean=mean, │
│ ❱ 242 │ │ min=torch.where(mask.bool(), xs, np.inf).min(), │
│ 243 │ │ max=torch.where(mask.bool(), xs, -np.inf).max(), │
│ 244 │ │ std=torch.sqrt(((xs - mean) * mask).pow(2).sum() / n), │
│ 245 │ ) │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: expected scalar type c10::Half but found double
0%| | 0/10000 [00:00<?, ?it/s]
[13:32:57] ERROR failed (exitcode: 1) local_rank: 0 (pid: 74519) ]8;id=862096;file:///home/ubuntu/.local/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py\api.py]8;;\:]8;id=300854;file:///home/ubuntu/.local/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py#684\684]8;;\
of binary: /usr/bin/python3
Which trlX version are you using?
No response
Additional system and package information
No response
PPO_config
train: seq_length: 550 epochs: 50 total_steps: 10000 batch_size: 1
checkpoint_interval: 1000 eval_interval: 200
pipeline: "PromptPipeline" trainer: "AcceleratePPOTrainer"
model: model_path: "amirasghar/SFT_GPT_J"
tokenizer: tokenizer_path: "gpt2" truncation_side: "right"
optimizer: name: "adamw" kwargs: lr: 5.0e-6 betas: [0.9, 0.999] eps: 1.0e-8 weight_decay: 0.01
scheduler: name: "cosine_annealing" kwargs: T_max: 100000 eta_min: 5.0e-6
method: name: "ppoconfig" num_rollouts: 128 chunk_size: 16 ppo_epochs: 4 init_kl_coef: 0.1 target: 6 horizon: 10000 gamma: 1 lam: 0.95 cliprange: 0.2 cliprange_value: 0.2 vf_coef: 0.2 scale_reward: False ref_mean: null ref_std: null cliprange_reward: 10 gen_kwargs: max_new_tokens: 320
default_accelerate
command_file: null commands: null compute_environment: LOCAL_MACHINE deepspeed_config: deepspeed_config_file: examples/summarize_rlhf/configs/ds_config_trlx_gptj_summarize.json zero3_init_flag: false distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} gpu_ids: null machine_rank: 0 main_process_ip: null main_process_port: null main_training_function: main megatron_lm_config: {} num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_name: null tpu_zone: null use_cpu: false