Open TrevorAshby opened 9 months ago
Perform RLHF on a model that is not fine-tuned prior to the reinforcement learning.
Perform RLHF on a model that is not fine-tuned prior to the reinforcement learning.