LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
36.85k stars 3.22k forks source link

What is the details for the data used for reward model OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1? #3686

Open REIGN12 opened 10 months ago

REIGN12 commented 10 months ago

Many thanks for your great open sourcing effort! And I am new to this field and I am particularly interested in the data for training reward model. I noticed that there is a simple dataset config for this, but I am a little bit confused about the details.

  datasets:
    - oasst_export:
        lang: "en,es,de,fr"
        input_file_path: 2023-03-27_oasst_research_ready_synth.jsonl.gz
        val_split: 0.1
    - anthropic_rlhf:
        fraction: 0.1
        max_val_set: 1000
    - shp:
        max_val_set: 1000
    - hellaswag:
        fraction: 0.5
        max_val_set: 1000
    - webgpt:
         val_split: 0.05
         max_val_set: 1000
    - hf_summary_pairs:
         fraction: 0.1
         max_val_set: 250

How can we use hellaswag as a comparison dataset? There seem to be multiple choices(rather than 2) Is there any experimental evidence support the fraction setting we are currently using?

Many thanks for any responses in advance!