GanjinZero / RRHF

[NIPS2023] RRHF & Wombat
780 stars 49 forks source link

training with my own gpt2 #22

Open dyyzhmm opened 1 year ago

dyyzhmm commented 1 year ago

To train Rrhf using my own Gpt2 model, do I need to first generate a response based on the query using my own model, and then have ChatGPT score it? This way, isn't wombat_train.json useless anymore?

GanjinZero commented 1 year ago

You can also use our data to train RRHF, however we do not know the performance without sampling by current policy model (i.e. gpt2)