OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.71k stars 160 forks source link

Qwen2 ppo #333

Open Yusifu opened 6 days ago

Yusifu commented 6 days ago

I use qwen2 model for both actor and reward, but I get the following exception at action_log_probs = self.actor(sequences, num_actions, attention_mask) in experience = self.experience_maker.make_experience(rand_prompts, **self.generate_kwargs)

ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to calltokenizer.padding_side = 'left'before tokenizing the input.

why this exception raise and how to solve it~?

hijkzzz commented 6 days ago

just disable flash_attn