ServiceNow / Fast-LLM

Accelerating your LLM training to full speed
https://servicenow.github.io/Fast-LLM/
Other
37 stars 5 forks source link

[WIP] GRPO #20

Open rafapi opened 4 weeks ago

rafapi commented 4 weeks ago

โœจ Description

Adds GRPO (Group Relative Policy Optimization) implementation for LLM Reinforcement Learning. GRPO delivers PPO-level performance gains for mathematical reasoning while using significantly less memory (no critic model needed), and can achieve substantial accuracy improvements using just existing instruction tuning data.

๐Ÿ” Type of change

Select all that apply:

๐Ÿ“ Changes

List the key changes introduced in this PR:

  1. Change A
  2. Change B

โœ… Checklist

Make sure the following tasks are completed before submitting the PR:

General:

Dependencies and Configuration:

Testing:

Performance Impact:

๐Ÿ“Š Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:


๐Ÿ“ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.