Open trias702 opened 5 days ago
Adds support for the Self-Rewarding and Meta-Rewarding algorithms from the following two papers:
https://arxiv.org/abs/2401.10020 https://arxiv.org/abs/2407.19594
Please see the new tutorial document at: docs/user-guide/self_rewarding.rst
docs/user-guide/self_rewarding.rst
Pre checks:
max_steps=-1
validation
What does this PR do ?
Adds support for the Self-Rewarding and Meta-Rewarding algorithms from the following two papers:
https://arxiv.org/abs/2401.10020 https://arxiv.org/abs/2407.19594
Changelog
Usage
Please see the new tutorial document at:
docs/user-guide/self_rewarding.rst
Before your PR is "Ready for review"
Pre checks:
Checklist when contributing a new algorithm
max_steps=-1
andvalidation
?Additional Information