Closed natolambert closed 6 months ago
To use models like GPT4 and others as a baseline, we need a script that generates a response to which is better. I'm not sure if we want to include this yet.
An example model is Auto J.
Even with temperature = 0, there are lots of ways for this to seem unnecessary and non-deterministic (unless trained with DPO).
To use models like GPT4 and others as a baseline, we need a script that generates a response to which is better. I'm not sure if we want to include this yet.
An example model is Auto J.
Even with temperature = 0, there are lots of ways for this to seem unnecessary and non-deterministic (unless trained with DPO).