TrevorAshby / CodeRLHF

0 stars 0 forks source link

RLHF #8

Open TrevorAshby opened 9 months ago

TrevorAshby commented 9 months ago

Perform RLHF on a model that is not fine-tuned prior to the reinforcement learning.