CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
MIT License
4.5k stars 472 forks source link

Multi-GPU training errors with peft #581

Open AliengirlLiv opened 11 months ago

AliengirlLiv commented 11 months ago

🐛 Describe the bug

When I try to use multi-gpu training with accelerate I get an error.

Code:

import trlx
from peft import LoraConfig, TaskType
from trlx.data.configs import (
    ModelConfig,
    OptimizerConfig,
    SchedulerConfig,
    TokenizerConfig,
    TrainConfig,
    TRLConfig,
)
from trlx.models.modeling_ppo import PPOConfig

config = TRLConfig(
    train=TrainConfig(
        seq_length=1024,
        epochs=50,
        total_steps=100000,
        batch_size=1,
        checkpoint_interval=1000,
        eval_interval=200,
        pipeline="PromptPipeline",
        trainer="AcceleratePPOTrainer",
    ),
        model=ModelConfig(model_path='gpt2',
                          num_layers_unfrozen=1,
                        # peft_config={"peft_type": "LORA", "r": 1, "lora_alpha": 32, "lora_dropout": 0.1},
                        ),
        tokenizer=TokenizerConfig(tokenizer_path='gpt2', truncation_side="right"),
        optimizer=OptimizerConfig(name="adamw"),
    scheduler=SchedulerConfig(name="cosine_annealing", kwargs={"T_max": 100000, "eta_min": 5.0e-6},),
    method=PPOConfig(
        name="PPOConfig",
        num_rollouts=128,
        chunk_size=16,
        ppo_epochs=4,
        init_kl_coef=0.1,
        target=6,
        horizon=10000,
        gamma=1,
        lam=0.95,
        cliprange=0.2,
        cliprange_value=0.2,
        vf_coef=0.2,
        scale_reward=None,
        ref_mean=None,
        ref_std=None,
        cliprange_reward=10,
        gen_kwargs={
            "max_new_tokens": 50,
        },
    ),
)

if __name__ == "__main__":

    def reward_fn(samples, **kwargs):
        return [0] * len(samples)

    trainer = trlx.train(
        reward_fn=reward_fn,
        prompts=['dummy dataset'],
        config=config,
    )

Launch command:

CUDA_VISIBLE_DEVICES=0,1 debug=true accelerate launch --mixed_precision bf16 trlx_minimal.py

Error:

File "/home/olivia/experiments/cot_reliability/trlx_minimal.py", line 73, in <module>
    trainer = trlx.train(
  File "/home/olivia/miniconda3/envs/exps/lib/python3.9/site-packages/trlx/trlx.py", line 92, in train
    trainer = get_trainer(config.train.trainer)(
  File "/home/olivia/miniconda3/envs/exps/lib/python3.9/site-packages/trlx/trainer/accelerate_ppo_trainer.py", line 74, in __init__
    if not hasattr(self.model, "frozen_head") and not self.model.peft_type:
  File "/home/olivia/miniconda3/envs/exps/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'DistributedDataParallel' object has no attribute 'peft_type'

The error comes from these lines in accelerate_ppo_trainer.py:

self.model, self.opt, self.scheduler, rollout_loader = self.accelerator.prepare(
            self.model, self.opt, self.scheduler, rollout_loader
        )
self.store.clear_history()  # Clear the rollout store
if not hasattr(self.model, "frozen_head") and not self.model.peft_type:
    self.ref_model = self.get_arch(self.config)

self.model originally has a peft_type attribute set to None, but in multi-gpu mode it seems like the self.accelerator.prepare call wraps the model in a DistributedDataParallel which doesn't have this attribute.

We can get around this by storing the peft_type attribute from before accelerate.prepare and setting it afterwards. This makes the code run correctly.

However, even with this change, multi-gpu training does not work with using peft to implement LoRA.

If I uncomment the peft_config lines in the example script above and change num_layers_unfrozen to 1, then this seems to work correctly with single-gpu training. However, when I add a second GPU, then the script fails with an error saying that DistributedDataParallel has no attribute forward_hydra.

This problem can be fixed by removing all references to peft_type in accelerate_ppo_trainer.py. (This also makes the fix above unnecesary). When I do this it seems to be running correctly with LoRA on both GPUs. However, I am not familiar enough with this codebase to know if this fix introduces additional errors which are not obvious.

Which trlX version are you using?

trlx==0.7.0

Additional system and package information

python 3.9, transformers 4.35.0, accelerate 0.24.1, Ubuntu

Jing-L97 commented 2 months ago

Hi, I met the same issue on left_type. Did you solve this in the end?