LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
36.86k stars 3.22k forks source link

OA Developer Meeting #3482

Open AbdBarho opened 1 year ago

AbdBarho commented 1 year ago

Last meeting #3321

yuechen-li-dev commented 1 year ago

High Priority:

  1. System prompt prefix and initial prompt categorization tasks: Should include language, task categorization, and other tags. Example would be like <|system|> lang:en, task:coding, tag:python <|prompter|> ...
  2. Review system design to clean up existing data: should Include edit proposal + annotation system. In works: https://github.com/LAION-AI/Open-Assistant/pull/3289
  3. Pause on English data collection when review system is implemented to focus on review against a static, non-moving target, as current English data contribution recently have too much spam. Release data at pause point as Oasst1.1.

Medium Priority:

  1. A clear, consistent labeling guideline, as the previous RLHF results isn't ideal. Proposal for review: https://github.com/LAION-AI/Open-Assistant/issues/2893, can add "potentially synthetic" tag as well.
  2. Design regular dataset release cadence for future. Maybe every two weeks?
  3. Liberapay/Open Collective setup for funding.

Low Priority

  1. Dataset Language localization: zh-hant and zh-hans conversion should be easy as there are no grammar differences and there are non-LLM libraries that can do it efficiently already.
  2. Set up a Lemmy instance for "Ask Open Assistant" as an alternative to Reddit to get more realistic human/bot interaction data/feedback.
someone13574 commented 1 year ago

Message categorization is also useful for more strict quality ratings since it would mean we could direct people who are more knowledgeable about a given subject to label and respond to those messages.

AbdBarho commented 1 year ago

Meeting notes

Inference System:

Preprompt:

Review:

Data quality English: