Closed Jiaxin-Wen closed 1 year ago
I find [rollout 134 / 128]: : 134it [08:45, 3.92s/it]
in the logging output which is kind of strange, is this reasonable?
Which accelerate version and config have you used here? I want to reproduce this
accelerate version: 0.16.0
accelerate config:
command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
deepspeed_config_file: configs/ds_config_trlx_gptj_summarize.json
zero3_init_flag: false
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false
deepspeed config:
{
"train_micro_batch_size_per_gpu": 2,
"gradient_accumulation_steps": 4,
"fp16": {
"enabled": true,
"min_loss_scale": 0.5,
"fp16_scale_tolerance": 0.25,
"opt_level": "O2"
},
"zero_optimization": {
"stage": 2,
"offload_param": {
"device": "cpu"
},
"offload_optimizer": {
"device": "cpu"
},
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"contiguous_gradients": true
}
}
ppo config
train:
seq_length: 550
epochs: 50
total_steps: 100000
batch_size: 8
checkpoint_interval: 10000
eval_interval: 200
pipeline: "PromptPipeline"
trainer: "AcceleratePPOTrainer"
model:
model_path: "sft/gptj-supervised-summarize-checkpoint"
num_layers_unfrozen: 8
tokenizer:
tokenizer_path: "gpt2"
truncation_side: "right"
optimizer:
name: "adamw"
kwargs:
lr: 5.0e-6
betas: [0.9, 0.999]
eps: 1.0e-8
weight_decay: 0.01
scheduler:
name: "cosine_annealing"
kwargs:
T_max: 100000
eta_min: 5.0e-6
method:
name: "ppoconfig"
num_rollouts: 128
chunk_size: 16
ppo_epochs: 4
init_kl_coef: 0.1
target: 6
horizon: 10000
gamma: 1
lam: 0.95
cliprange: 0.2
cliprange_value: 0.2
vf_coef: 0.2
scale_reward: False
ref_mean: null
ref_std: null
cliprange_reward: 10
gen_kwargs:
max_new_tokens: 50
Oops, I think I find the reason. I update accerlerate_base_trainer.py
to the latest version (according to #315 )
๐ Describe the bug
I am running
example/summarize_rlhf
.I have successfully run the code when the
make_experience
function was inorchestrator/ppo_orchestrator
a few days ago. However, after syncing with the latest version (main branch), I find that the PPO training hangs and raise the following timeout error:I haven't found the root cause of this issue, but here is one modification that I am aware of:
Which trlX version are you using?
main (latest)
Additional system and package information
torch 1.13.1