OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.71k stars 160 forks source link

QLORA model loading error #295

Open karthik-nexusflow opened 1 month ago

karthik-nexusflow commented 1 month ago

Hi team getting the following error while enabling 4-bit and LORA

File "/root/miniconda3/envs/open/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 262, in __init__
    self._configure_distributed_model(model)
  File "/root/miniconda3/envs/open/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1112, in _configure_distributed_model
    self.module.to(self.device)
  File "/root/miniconda3/envs/open/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2670, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
ray job submit --address="http://127.0.0.1:8265" \
    --runtime-env-json='{"working_dir": "/workspace/openrlhf/research/openrlhf"}' \
    -- python3 examples/train_ppo_ray.py \
    --ref_num_nodes 1 \
    --ref_num_gpus_per_node 1 \
    --reward_num_nodes 1 \
    --reward_num_gpus_per_node 1 \
    --critic_num_nodes 1 \
    --critic_num_gpus_per_node 2 \
    --actor_num_nodes 1 \
    --actor_num_gpus_per_node 2 \
    --pretrain mistralai/Mixtral-8x7B-Instruct-v0.1 \
    --reward_pretrain mistralai/Mixtral-8x7B-Instruct-v0.1 \
    --critic_pretrain mistralai/Mixtral-8x7B-Instruct-v0.1 \
    --save_path /workspace/NexusNest/research/openrlhf/examples/scripts/ckpt/starling_7b \
    --micro_train_batch_size 8 \
    --train_batch_size 32 \
    --micro_rollout_batch_size 8 \
    --rollout_batch_size 512 \
    --max_epochs 1 \
    --prompt_max_len 1000 \
    --generate_max_len 1000 \
    --zero_stage 3 \
    --bf16 \
    --actor_learning_rate 2e-7 \
    --critic_learning_rate 2e-7 \
    --init_kl_coef 0.01 \
    --prompt_data \
    --prompt_data_probs 1 \
    --max_samples 100000 \
    --actor_init_on_gpu \
    --adam_offload \
    --gradient_checkpointing \
    --vllm_num_engines 2 \
    --vllm_tensor_parallel_size 1 \
    --save_steps 5 \
    --eval_steps 1 \
    --normalize_reward \
    --reward_mean -7.5 \
    --reward_variance 1 \
    --load_in_4bit \
    --lora_rank 64 \
    --lora_alpha 64 \
hijkzzz commented 1 month ago

QLora only supports ZeRO2 For ZeRO3 please use LoRA

karthik-nexusflow commented 1 month ago

with zero2 tried running it runs fine till the first epoch , but fails during vlmm update

KeyError: 'base_model.model.model.norm.weight' (ActorModelRayActor pid=1342974) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::LLMRayActor.update_weight() (pid=1343444, ip=0.0.0.0, actor_id=587ce79997d309074618728202000000, repr=<openrlhf.trainer.ray.vllm_engine.LLMRayActor object at 0x7fc43c3e9bd0>) (ActorModelRayActor pid=1342974) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (ActorModelRayActor pid=1342974) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (ActorModelRayActor pid=1342974) File "/tmp/ray/session_2024-05-19_00-15-15_748412_1317547/runtime_resources/working_dir_files/_ray_pkg_113e512fb2c5c1c0/openrlhf/trainer/ray/vllm_engine.py", line 90, in update_weight (ActorModelRayActor pid=1342974) self.llm.llm_engine._run_workers("update_weight", name, dtype, shape, empty_cache) (ActorModelRayActor pid=1342974) File "/root/miniconda3/envs/open/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 750, in _run_workers (ActorModelRayActor pid=1342974) self._run_workers_in_batch(workers, method, *args, *kwargs)) (ActorModelRayActor pid=1342974) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (ActorModelRayActor pid=1342974) File "/root/miniconda3/envs/open/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 724, in _run_workers_in_batch (ActorModelRayActor pid=1342974) output = executor(args, **kwargs) (ActorModelRayActor pid=1342974) ^^^^^^^^^^^^^^^^^^^^^^^^^ (ActorModelRayActor pid=1342974) File "/tmp/ray/session_2024-05-19_00-15-15_748412_1317547/runtime_resources/working_dir_files/_ray_pkg_113e512fb2c5c1c0/openrlhf/trainer/ray/vllm_engine.py", line 53, in update_weight (ActorModelRayActor pid=1342974) self.model_runner.model.load_weights(model_name_or_path={name: weight}) (ActorModelRayActor pid=1342974) File "/root/miniconda3/envs/open/lib/python3.11/site-packages/vllm/model_executor/models/mistral.py", line 329, in load_weights (ActorModelRayActor pid=1342974) param = params_dict[name] (ActorModelRayActor pid=1342974) ~~~^^^^^^ (ActorModelRayActor pid=1342974) KeyError: 'base_model.model.lm_head.weight'

hijkzzz commented 1 month ago

We did not implement vLLM support for LoRA

karthik-nexusflow commented 1 month ago

probably we need to have a remote function that inserts LORA adapters when sent a command

we then execute that for the first time we need to update LORA

then update weights , is this viable approach ? without vllm generation would be pretty slow how are ypu guys handling that ?

hijkzzz commented 1 month ago

probably we need to have a remote function that inserts LORA adapters when sent a command

we then execute that for the first time we need to update LORA

then update weights , is this viable approach ? without vllm generation would be pretty slow how are ypu guys handling that ?

The easiest way is to merge the weights before sync weights