feat: Add Reward Model Training

Kipok / NeMo-Skills

A pipeline to improve skills of large language models

https://kipok.github.io/NeMo-Skills/

Apache License 2.0

185 stars 41 forks source link

feat: Add Reward Model Training #207

Closed gwarmstrong closed 1 week ago

gwarmstrong commented 1 week ago

Augments the training pipeline to support Reward Model Training with NeMo-Aligner

gwarmstrong commented 1 week ago

@Kipok the gpu test failure appears to have come from an unrelated test--any idea what that's about?

Kipok commented 1 week ago

The tests are failing because of an issue introduced in another PR. Let me run the training tests locally to double check there are no issues there and we can merge after that