Suggested refactor of the reward-modeling likelihood, let me know if I missed something. Advantages:
Simplicity and readability. We get rid of self.reward_modeling and self._fitting (actually self._fitting was not even used before, not sure if you had a future use case in mind?).
When a user calls la.likelihood it will always return "reward_modeling" which should be the expected behavior.
(Unrelated: I also removed a superfluous loss_with_var argument.)
Suggested refactor of the reward-modeling likelihood, let me know if I missed something. Advantages:
self.reward_modeling
andself._fitting
(actuallyself._fitting
was not even used before, not sure if you had a future use case in mind?).la.likelihood
it will always return"reward_modeling"
which should be the expected behavior.(Unrelated: I also removed a superfluous
loss_with_var
argument.)