CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
MIT License
4.45k stars 470 forks source link

RunTimeError using Accelerate + Zero Stage 3 to launch ppo_sentiments.py #461

Closed alex-athanassakos closed 1 year ago

alex-athanassakos commented 1 year ago

🐛 Describe the bug

Hi!

I have been trying to use Accelerate with Deep Speed to launch my TRLX scripts and have been running into this error in several instances:

│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/ppo_sentiments.py:5 │
│ 7 in <module>                                                               │
│                                                                             │
│   54                                                                        │
│   55 if __name__ == "__main__":                                             │
│   56 │   hparams = {} if len(sys.argv) == 1 else json.loads(sys.argv[1])    │
│ ❱ 57 │   main(hparams)                                                      │
│   58                                                                        │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/ppo_sentiments.py:4 │
│ 7 in main                                                                   │
│                                                                             │
│   44 │   imdb = load_dataset("imdb", split="train+test")                    │
│   45 │   prompts = [" ".join(review.split()[:4]) for review in imdb["text"] │
│   46 │                                                                      │
│ ❱ 47 │   trlx.train(                                                        │
│   48 │   │   reward_fn=reward_fn,                                           │
│   49 │   │   prompts=prompts,                                               │
│   50 │   │   eval_prompts=["I don't know much about Hungarian underground"] │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/src/trlx/trlx/ │
│ trlx.py:101 in train                                                        │
│                                                                             │
│    98 │   │   if eval_prompts is None:                                      │
│    99 │   │   │   eval_prompts = prompts[:batch_size]                       │
│   100 │   │                                                                 │
│ ❱ 101 │   │   trainer.make_experience(config.method.num_rollouts)           │
│   102 │                                                                     │
│   103 │   # Offline training from the collected samples (e.g. SFT, ILQL)    │
│   104 │   elif samples:                                                     │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/src/trlx/trlx/ │
│ trainer/accelerate_ppo_trainer.py:409 in make_experience                    │
│                                                                             │
│   406 │   │   │   │   │   )                                                 │
│   407 │   │   │   │   │   # TODO(dahoas): When hydra model works need to al │
│   408 │   │   │   │   │   if hasattr(self.model, "frozen_head"):            │
│ ❱ 409 │   │   │   │   │   │   ref_logits = self.model.forward_hydra(        │
│   410 │   │   │   │   │   │   │   all_tokens,                               │
│   411 │   │   │   │   │   │   │   attention_mask=attention_mask,            │
│   412 │   │   │   │   │   │   │   return_dict=True,                         │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/src/trlx/trlx/ │
│ models/modeling_ppo.py:387 in forward_hydra                                 │
│                                                                             │
│    384 │   │   output_shape = outputs.hidden_states[-1].size()              │
│    385 │   │   forward_kwargs.pop("input_ids", None)  # Ignore `input_ids`  │
│    386 │   │   forward_kwargs.pop("inputs_embeds", None)  # Ignore `inputs_ │
│ ❱  387 │   │   hydra_outputs = self.frozen_head(input_hidden_state, output_ │
│    388 │   │                                                                │
│    389 │   │   if not return_dict:                                          │
│    390 │   │   │   return hydra_outputs.logits                              │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/torch/nn/modules/module.py:1538 in _call_impl                 │
│                                                                             │
│   1535 │   │   │   bw_hook = hooks.BackwardHook(self, full_backward_hooks,  │
│   1536 │   │   │   args = bw_hook.setup_input_hook(args)                    │
│   1537 │   │                                                                │
│ ❱ 1538 │   │   result = forward_call(*args, **kwargs)                       │
│   1539 │   │   if _global_forward_hooks or self._forward_hooks:             │
│   1540 │   │   │   for hook_id, hook in (                                   │
│   1541 │   │   │   │   *_global_forward_hooks.items(),                      │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/src/trlx/trlx/ │
│ models/modeling_ppo.py:504 in forward                                       │
│                                                                             │
│    501 │   │   │   # Assumes we are never training the branch               │
│    502 │   │   │   block_params = inspect.getfullargspec(block.forward).arg │
│    503 │   │   │   if "encoder_hidden_states" in block_params:              │
│ ❱  504 │   │   │   │   outputs = block(                                     │
│    505 │   │   │   │   │   hidden_states,                                   │
│    506 │   │   │   │   │   layer_past=layer_past,                           │
│    507 │   │   │   │   │   attention_mask=attention_mask,                   │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/torch/nn/modules/module.py:1538 in _call_impl                 │
│                                                                             │
│   1535 │   │   │   bw_hook = hooks.BackwardHook(self, full_backward_hooks,  │
│   1536 │   │   │   args = bw_hook.setup_input_hook(args)                    │
│   1537 │   │                                                                │
│ ❱ 1538 │   │   result = forward_call(*args, **kwargs)                       │
│   1539 │   │   if _global_forward_hooks or self._forward_hooks:             │
│   1540 │   │   │   for hook_id, hook in (                                   │
│   1541 │   │   │   │   *_global_forward_hooks.items(),                      │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/transformers/models/gpt2/modeling_gpt2.py:388 in forward      │
│                                                                             │
│    385 │   │   output_attentions: Optional[bool] = False,                   │
│    386 │   ) -> Union[Tuple[torch.Tensor], Optional[Tuple[torch.Tensor, Tup │
│    387 │   │   residual = hidden_states                                     │
│ ❱  388 │   │   hidden_states = self.ln_1(hidden_states)                     │
│    389 │   │   attn_outputs = self.attn(                                    │
│    390 │   │   │   hidden_states,                                           │
│    391 │   │   │   layer_past=layer_past,                                   │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/torch/nn/modules/module.py:1527 in _call_impl                 │
│                                                                             │
│   1524 │   │   │   │   │   │   │   │   f"of (new_args, new_kwargs), but got │
│   1525 │   │   │   │   │   │   │   )                                        │
│   1526 │   │   │   │   else:                                                │
│ ❱ 1527 │   │   │   │   │   result = hook(self, args)                        │
│   1528 │   │   │   │   │   if result is not None:                           │
│   1529 │   │   │   │   │   │   if not isinstance(result, tuple):            │
│   1530 │   │   │   │   │   │   │   result = (result,)                       │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/deepspeed/utils/nvtx.py:11 in wrapped_fn                      │
│                                                                             │
│    8 │   function call."""                                                  │
│    9 │   def wrapped_fn(*args, **kwargs):                                   │
│   10 │   │   get_accelerator().range_push(func.__qualname__)                │
│ ❱ 11 │   │   ret_val = func(*args, **kwargs)                                │
│   12 │   │   get_accelerator().range_pop()                                  │
│   13 │   │   return ret_val                                                 │
│   14                                                                        │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/deepspeed/runtime/zero/parameter_offload.py:348 in            │
│ _pre_forward_module_hook                                                    │
│                                                                             │
│   345 │   │                                                                 │
│   346 │   │   @instrument_w_nvtx                                            │
│   347 │   │   def _pre_forward_module_hook(module, *args):                  │
│ ❱ 348 │   │   │   self.pre_sub_module_forward_function(module)              │
│   349 │   │                                                                 │
│   350 │   │   @instrument_w_nvtx                                            │
│   351 │   │   def _post_forward_module_hook(module, input, output):         │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/torch/utils/_contextlib.py:115 in decorate_context            │
│                                                                             │
│   112 │   @functools.wraps(func)                                            │
│   113 │   def decorate_context(*args, **kwargs):                            │
│   114 │   │   with ctx_factory():                                           │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                              │
│   116 │                                                                     │
│   117 │   return decorate_context                                           │
│   118                                                                       │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/deepspeed/runtime/zero/parameter_offload.py:478 in            │
│ pre_sub_module_forward_function                                             │
│                                                                             │
│   475 │   │   param_coordinator.trace_prologue(sub_module)                  │
│   476 │   │   if param_coordinator.is_record_trace():                       │
│   477 │   │   │   param_coordinator.record_module(sub_module)               │
│ ❱ 478 │   │   param_coordinator.fetch_sub_module(sub_module)                │
│   479 │   │                                                                 │
│   480 │   │   see_memory_usage(                                             │
│   481 │   │   │   f"Before sub module function {sub_module.__class__.__name │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/deepspeed/utils/nvtx.py:11 in wrapped_fn                      │
│                                                                             │
│    8 │   function call."""                                                  │
│    9 │   def wrapped_fn(*args, **kwargs):                                   │
│   10 │   │   get_accelerator().range_push(func.__qualname__)                │
│ ❱ 11 │   │   ret_val = func(*args, **kwargs)                                │
│   12 │   │   get_accelerator().range_pop()                                  │
│   13 │   │   return ret_val                                                 │
│   14                                                                        │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/torch/utils/_contextlib.py:115 in decorate_context            │
│                                                                             │
│   112 │   @functools.wraps(func)                                            │
│   113 │   def decorate_context(*args, **kwargs):                            │
│   114 │   │   with ctx_factory():                                           │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                              │
│   116 │                                                                     │
│   117 │   return decorate_context                                           │
│   118                                                                       │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py:273   │
│ in fetch_sub_module                                                         │
│                                                                             │
│   270 │   │   │   │   │   │      ) > self.__max_ongoing_fetch_events:       │
│   271 │   │   │   │   │   │   self.__ongoing_fetch_events.popleft().synchro │
│   272 │   │   │   │   │                                                     │
│ ❱ 273 │   │   │   │   │   self.__inflight_param_registry.pop(param).wait()  │
│   274 │   │   │   │   │                                                     │
│   275 │   │   │   │   │   event = get_accelerator().Event()                 │
│   276 │   │   │   │   │   event.record()                                    │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/deepspeed/utils/nvtx.py:11 in wrapped_fn                      │
│                                                                             │
│    8 │   function call."""                                                  │
│    9 │   def wrapped_fn(*args, **kwargs):                                   │
│   10 │   │   get_accelerator().range_push(func.__qualname__)                │
│ ❱ 11 │   │   ret_val = func(*args, **kwargs)                                │
│   12 │   │   get_accelerator().range_pop()                                  │
│   13 │   │   return ret_val                                                 │
│   14                                                                        │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/deepspeed/runtime/zero/partition_parameters.py:527 in wait    │
│                                                                             │
│    524 │   │   │   │   │   │   │   param.ds_tensor.ds_numel))               │
│    525 │   │   │   │   │   partitions.append(part_to_copy)                  │
│    526 │   │   │                                                            │
│ ❱  527 │   │   │   param.data = instrument_w_nvtx(torch.cat)(partitions).vi │
│    528 │   │   │   param.ds_status = ZeroParamStatus.AVAILABLE              │
│    529 │   │   │                                                            │
│    530 │   │   │   for part_to_copy in partitions:                          │
│                                                                             │
│ /home/ubuntu/Codes/resolver-ai/prompt_resp_gen_train/RL/venv/lib/python3.8/ │
│ site-packages/deepspeed/utils/nvtx.py:11 in wrapped_fn                      │
│                                                                             │
│    8 │   function call."""                                                  │
│    9 │   def wrapped_fn(*args, **kwargs):                                   │
│   10 │   │   get_accelerator().range_push(func.__qualname__)                │
│ ❱ 11 │   │   ret_val = func(*args, **kwargs)                                │
│   12 │   │   get_accelerator().range_pop()                                  │
│   13 │   │   return ret_val                                                 │
│   14                                                                        │
╰─────────────────────────────────────────────────────────────────────────────╯
RuntimeError: torch.cat(): expected a non-empty list of Tensors

I just reproduced it with ppo_sentiments.py (commit c9ab683.

I used this to launch it:

accelerate launch --config_file accelerate_config_example.yaml ppo_sentiments.py

With accelerate_config_example.yaml consisting of:

deepspeed_config:
  gradient_accumulation_steps: 4
  offload_optimizer_device: cpu
  offload_param_device: cpu
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
num_machines: 1
num_processes: 4
use_cpu: false

I am using a g5.12xlarge with Deep Learning AMI Neuron PyTorch 1.13.0 (Ubuntu 20.04) 20230330.

Let me know if you need more info!

Which trlX version are you using?

0.5.0

Additional system and package information

Python 3.8.10, transformers==4.28.0, Ubuntu 20.04, CUDA 11.7, torch==2.0.0

mbalesni commented 1 year ago

Hi @alex-athanassakos! Did you manage to get this to work?

I am running into another issue with a similar setup. It looks like the size of the weight tensor in the Hydra model (AFAIU - the value head) is 0, which I guess means the weight tensor is uninitialized. It feels different from your issue, where the list of tensors is empty — but maybe the root cause is similar?

mbalesni commented 1 year ago

I replicated the same issue with a different example file (ppo_sentiments_llama.py), configs (below), and model (EleutherAI/pythia-70m).

I use a custom trlx fork from trlx==0.6.0 with a few unrelated changes (treat as identical).

  1. I call the example like so:
accelerate launch trlx/examples/ppo_sentiments_llama.py --model_path EleutherAI/pythia-70m
  1. Here is my accelerate's default_config.yaml:
compute_environment: LOCAL_MACHINE
deepspeed_config:
  deepspeed_config_file: <path_to_config_dir>/deepspeed.json
  zero3_init_flag: true
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: true
  1. My Deepspeed config (deepspeed.json):
{
  "bf16": {
    "enabled": true
  },
  "zero_optimization": {
    "stage": 3,
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 1e9,
    "reduce_bucket_size": 1e9,
    "stage3_prefetch_bucket_size": 5e8,
    "stage3_param_persistence_threshold": 1e6,
    "stage3_max_live_parameters": 1e9,
    "stage3_max_reuse_distance": 1e9,
    "stage3_gather_16bit_weights_on_model_save": true
  },
  "gradient_accumulation_steps": 1,
  "gradient_clipping": 1.0,
  "steps_per_print": 2000,
  "train_batch_size": 32,
  "train_micro_batch_size_per_gpu": 8,
  "wall_clock_breakdown": false
}
alex-athanassakos commented 1 year ago

Thanks for the info @nikebless. I never did get this to work. I had a similar issue to yours with the reward model in my custom script. The embeddings weights were uninitialized. I ended up using the ppo_sentiments.py to debug, and posted the error it gave me because I thought it made things simpler than using my own script. But it sounds related to your issue!

maxreciprocate commented 1 year ago

Resolved with https://github.com/CarperAI/trlx/pull/489