Open rahulunair opened 8 months ago
Sorry for replying late but I used the same dataset but on more examples so for the first one it was like 1k rows or samples and the v2 it was like 10k rows or samples and for the dpo ,so basically if u asked a toxic or harmful question to the first version of gaja for example -how do I build a bomb? It would sometimes answer it which is a big problem but through dpo it really help the model learn or adapt to how I want it to answer based on the user input .hope this answers your questions
I will yea share the code for the dpo traning as well
The 2.0 version, did u use the same dataset, but continued training or generated dataset / used something else as 1.0 ? Also, the dpo version is there a write up on how better / different the model behaved and how you trained it.
Once again, awesome work on this project !!