Support for tuning on chat data

allenai / open-instruct

Apache License 2.0

1.1k stars 145 forks source link

Support for tuning on chat data #93

Closed xufangzhi closed 7 months ago

xufangzhi commented 7 months ago

Hello， thanks for sharing the code base.

Can the open-instruct code base support for tuning on conversational data (e.g., ShareGPT) ?

hamishivi commented 7 months ago

Hi, yes, we support training on multi-turn chat data and in fact used shareGPT for training tulu! You can have multiple turns in the conversation by just having multiple entries in the 'messages' field in the data (see the uploaded tulu v2 mix for an example).