NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment
Apache License 2.0
522 stars 58 forks source link

Self-Rewarding Algorithm with TRT Support #321

Open trias702 opened 5 days ago

trias702 commented 5 days ago

What does this PR do ?

Adds support for the Self-Rewarding and Meta-Rewarding algorithms from the following two papers:

https://arxiv.org/abs/2401.10020 https://arxiv.org/abs/2407.19594

Changelog

Usage

Please see the new tutorial document at: docs/user-guide/self_rewarding.rst

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

Additional Information