LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
37.06k stars 3.24k forks source link

OA dataset consistent splits #1661

Closed sanagno closed 1 year ago

sanagno commented 1 year ago

We need to make sure that different splits of the dataset are used for sft, reward and rl training.

Basically sft_dataset and reward datasets needs to use the same splits.

theblackcat102 commented 1 year ago

Set a same global seed maybe?

sanagno commented 1 year ago

I have a random generator with specific seed for the dataset only https://github.com/LAION-AI/Open-Assistant/blob/main/model/model_training/custom_datasets/prompt_dialogue.py#L31

bethanyconnolly commented 1 year ago

Hi! I am an industry research scientist and I am interested in contributing to these problems. Am I correct in my understanding that we want the dataset splits 'sft' and 'reward_model' to be identical while 'rl' is a different split. If so I think I can solve this problem by creating a fixed order of indexs using the random state and then using the same slice of indexs for 'sft' and 'reward_model' but a different slice for 'rl' . With this update making 'sft' and 'reward_model' the same split, the split sizes will need to be changed too so that all the data is used.

sanagno commented 1 year ago

hey @bethanyconnolly yes feel free to do this. Btw @theblackcat102 if you are doing any special preprocessing please push the script, I am missing the history tags.

olliestanley commented 1 year ago

I think this was not meant to be closed by #1776 so I am reopening

theblackcat102 commented 1 year ago

@sanagno by history tag, are you referring to the reward model? or the RLHF training?

sanagno commented 1 year ago

I basically mean this https://github.com/LAION-AI/Open-Assistant/blob/main/model/reward/instructor/rank_datasets.py#L323.

theblackcat102 commented 1 year ago

@sanagno sorry that’s on my fault, it’s a quick hack to include historical conversation into the question section ( first sentence ). Should we use the question, answer tag instead?

sanagno commented 1 year ago

Nono its fine, we can just add it to the dataset init or just the README to be easy to follow the instructions to recreate it

bethanyconnolly commented 1 year ago

Hi, I am feeling a bit confused about how to handle this issue now. Perhaps it can be unassigned and given to someone else with better context on the problem? Thanks