LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
37k stars 3.23k forks source link

RLHF Rating Rubric Revision Recommendations #2893

Open yuechen-li-dev opened 1 year ago

yuechen-li-dev commented 1 year ago

I feel that the current prompt/answer rating system can be too subjective and unclear at times, which may be affecting RLHF quality. So, from current experiences, here's my proposal on a modified rating system (comments in parenthesis):

  1. Required Tag: Is this prompt/answer spam? Defined "spam" as:

    • Irrelevant to the current conversation tree.
    • Wrong language.
    • Answers in prompt.
    • Extremely low effort.
    • (Basically, contents that should be removed, but won't generate useful safety training data as opposed to answer in 2)
  2. Optional tags: Does this need to be removed for rule violation?

    • Contains PII/Doxxing info
    • Encouraging violence or self-harm. (think it should be clear yes/no for removal purposes, don't think it's needed to have it from 1-5)
    • Pornographic content (including CSAM, but just "sexual content" might be too broad)
    • Discriminatory/extreme rudeness (also clear yes/no for removal purposes, don't think it's needed to have it from 1-5)
    • Unedited text from other chatbots. (Expanding on #2847, removal should probably be decided on a case-by-case basis, because sometimes those answers can be good, but it's good to have it labeled anyways to make it easier to remove the "as an AI language model" stuff. The "naturalness" aspect overlaps with quality a bit, so I don't think there is a need to score that from 1-5 when quality already exists)
    • (I think it's good to replace the current report system message to the tags in 1 and 2 to make it unambiguous what is and isn't allowed in the dataset and for easier report reviews)
  3. Optional Tags: Did the content break guideline that doesn't deserve removal, but should still be labeled for review/fixes in the future?

    • Potentially controversial (expect it to be mostly current event/political stuff)
    • Factually inaccurate (looking for potential ability to propose edits to those in the future)
    • Typos and Markdown errors. (also potential ability to propose edits to those in the future)
  4. Required: Rate from 1-5 (simplify from 6 metrics to 3, with unambiguous rating guidelines to ensure scoring is useful for RLHF training, good to use these 3 metrics for generated chat answer ratings as well)

a. Quality:

b. Bias:

c. Effort/Difficulty:

  1. Optional tags: Emoji reactions
    • Expanding on the thumbs up/thumbs down system, determine the tone of the answer with additional emojis like funny, sad, angry, happy, love, etc. (I remember there is a Microsoft project to use ML to tag text with emoji or something, so I thought this could be a more flexible way compared to only rating how humorous/sarcastic the answer is from 1-5)
andreaskoepf commented 1 year ago

Thanks for your proposal. Can you help with web-design or backend-development?