NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment
Apache License 2.0
522 stars 58 forks source link

RPO on multiple responses #311

Open Davood-M opened 1 week ago

Davood-M commented 1 week ago

What does this PR do ?

Adding RPO on multiple responses for alignment. RPO is able to take a dataset with a variable number of responses per prompt.

Changelog

Usage

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

Additional Information