Closed wheresmyhair closed 4 months ago
[Ready for review] Reward modeling support Tested on:
Full finetuning
LoRA
LISA
Several additional fixes in this PR:
--conversation_template disable
[Ready for review] Reward modeling support Tested on:
Full finetuning
LoRA
LISA