allenai / open-instruct

Apache License 2.0
1.21k stars 166 forks source link

Add Rejection Sampling #166

Closed natolambert closed 1 month ago

natolambert commented 4 months ago

Would be interesting to test after IFT and before PPO particularly. Thinking about it, rejection sampling helps the model prioritize the distribution that the reward model likes, which sounds similar to share prompts on the RM and PPO stages.