Reward modeling support

OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

https://optimalscale.github.io/LMFlow/

Apache License 2.0

8.22k stars 821 forks source link

Closed wheresmyhair closed 4 months ago

wheresmyhair commented 4 months ago

[Ready for review] Reward modeling support Tested on:

research4pan commented 4 months ago

Several additional fixes in this PR:

Squash warnings for samplings exceeding maximum lengths during tokenization & grouping.
Remove --conversation_template disable