reinforcement learning improvement

deepseek-ai / DeepSeek-Coder-V2

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

MIT License

1.79k stars 86 forks source link

reinforcement learning improvement #31

Closed Ski-ing closed 1 month ago

Ski-ing commented 1 month ago

How significant is the improvement in code generation performance metrics attributed to the Group Relative Policy Optimization (GRPO) within the reinforcement learning component?

DeepSeekPH commented 1 month ago

The performance of GRPO varies depending on the test sets. Generally, GRPO demonstrates an improvement of approximately 0.5 points on code generation test sets. The enhancements on math-related benchmarks are more substantial.

Ski-ing commented 1 month ago

Thanks for your reply