OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.71k stars 160 forks source link

run train_ppo_llama_ray.sh error #314

Closed yangzhipeng1108 closed 4 weeks ago

yangzhipeng1108 commented 4 weeks ago

set -x export PATH=$HOME/.local/bin/:$PATH

ray job submit --address="http://127.0.0.1:8265" \ --runtime-env-json='{"working_dir": "/workspace/OpenRLHF", "pip": "/workspace/OpenRLHF/requirements.txt"}' \ -- python3 /workspace/OpenRLHF/examples/train_ppo_ray.py \ --ref_num_nodes 1 \ --ref_num_gpus_per_node 1 \ --reward_num_nodes 1 \ --reward_num_gpus_per_node 1 \ --critic_num_nodes 1 \ --critic_num_gpus_per_node 2 \ --actor_num_nodes 1 \ --actor_num_gpus_per_node 4 \ --pretrain /workspace/model_ckpts1/Llama-2-7b-hf \ --reward_pretrain /workspace/model_ckpts1/Llama-2-7b-hf \ --save_path /workspace/OpenRLHF/examples/scripts/ckpt/7b_llama \ --micro_train_batch_size 8 \ --train_batch_size 128 \ --micro_rollout_batch_size 16 \ --rollout_batch_size 1024 \ --max_epochs 1 \ --prompt_max_len 1024 \ --generate_max_len 1024 \ --zero_stage 2 \ --bf16 \ --actor_learning_rate 5e-7 \ --critic_learning_rate 9e-6 \ --init_kl_coef 0.01 \ --prompt_data /workspace/data/OpenOrca,/workspace/data/tasksource/oasst1_pairwise_rlhf_reward \ --prompt_data_probs 0.5,0.5 \ --max_samples 80000 \ --normalize_reward \ --actor_init_on_gpu \ --adam_offload \ --flash_attn \ --gradient_checkpointing \ --use_wandb {wandb_token}

image image