Awesome work, some questions.

Sorry for replying late but I used the same dataset but on more examples so for the first one it was like 1k rows or samples and the v2 it was like 10k rows or samples and for the dpo ,so basically if u asked a toxic or harmful question to the first version of gaja for example -how do I build a bomb? It would sometimes answer it which is a big problem but through dpo it really help the model learn or adapt to how I want it to answer based on the user input .hope this answers your questions

I will yea share the code for the dpo traning as well

dame-cell / Gaja

Awesome work, some questions. #1