Open dyyzhmm opened 1 year ago
To train Rrhf using my own Gpt2 model, do I need to first generate a response based on the query using my own model, and then have ChatGPT score it? This way, isn't wombat_train.json useless anymore?
You can also use our data to train RRHF, however we do not know the performance without sampling by current policy model (i.e. gpt2)
To train Rrhf using my own Gpt2 model, do I need to first generate a response based on the query using my own model, and then have ChatGPT score it? This way, isn't wombat_train.json useless anymore?