I really liked your work on RLHF research. A very clear description in the technical report and a good implementation. I studied the entire code in detail and read the article several times.
For my research, I would like to reproduce your results, but I can't find a dataset with English prompts that was used in the PPO algorithm. In the article you write that a manually collected dataset was used, but I can't find it anywhere. Could you share this dataset so I can run your code, please?
Hello!
I really liked your work on RLHF research. A very clear description in the technical report and a good implementation. I studied the entire code in detail and read the article several times.
For my research, I would like to reproduce your results, but I can't find a dataset with English prompts that was used in the PPO algorithm. In the article you write that a manually collected dataset was used, but I can't find it anywhere. Could you share this dataset so I can run your code, please?
Thanks