Most user generated prompts are too complex

baptistejamin commented 1 year ago

Hey there.

Congratulations on such a project. This is super ambitious, and the outcome could be amazing.

The idea of this project at the end is to train an LLM with billions of parameters and praying after a few thousand or millions of results it will automatically work.

I am also working on LLMs for different projects, and my key understanding to fine-tune a model, dataset needs to be super, super, super qualified. This is the key.

We need to think as a function, as an algorithm (and it's why fine-tuning works so well with Codegen models!)

Most prompts I saw on the platform are cognitively complex for humans. Probably too complex in my humble-opinion.

The outcome is:

People doing the answers are likely to be annoyed and most likely will use ChatGPT to craft the answers.
Once doing the initial fine-tuning, the model will likely don't understand what the f* is going on, because anwsers are too smart and mostly generated with AI. Parameters won't converge to anything good.
Rewarding model will likely not improve anything because of the initial tuning
PPO won't improve anything. The system will not understand what it is supposed to do.

Think as a 10 years kid, asked to answer complex software engineering problem. You would'd understand anything what is going on.

There are a few solutions:

Rating prompts by complexity
Promoting simple prompts, it will create more interactions on the platform. Human users will have more dopamine achieving the tasks, and will likely stop using Gptchat for such prompts.
It allows splittings the dataset by complexity. I don't believe a 6B LLM can be as smart as a 20B one. The splitting can be useful for this.

horribleCodes commented 1 year ago

Interesting point. I'm afraid the labels give people a preconception of what is and isn't a "good prompt", like it has to get full marks to be of highest quality. With that being said, the creativity-metric could encourage people trying to ask complex questions on purpose - I certainly tried making a few outlandish requests that required chain-of-thought and lateral thinking myself.

The trickiest part about adding a new label is that we already have just shy of 100k messages, so we either have to work with tons of unlabeled data, or retroactively add them all in some manner that can't be fully automated.

I think the idea is fair though - when we were discussing the definition of the Ordinary/Creative-spectrum, some people interpreted as complexity, so to some, this metric already exists. I'm all for clear definition of which part each label plays, so it can help.

At the very least, if the others are in agreement, we could change the guidelines to demand simple questions that can be answered easily.

bitplane commented 1 year ago

If you feel a prompt is too complex or it feels a bit like asymmetric warfare by the prompter, then simply skip it. If enough people skip a prompt it'll be flagged by auto-moderation as a bad prompt, and it'll be aborted.

nspaeth commented 1 year ago

Is it a bad prompt though? Some complex prompts might be perfectly reasonable to ask an AI agent, but unreasonable to ask a human to spend time on.

mashdragon commented 1 year ago

I would prefer it if all prompts were complex, actually. If Open Assistant can't answer anything difficult, it's not going to be very helpful.

The people asking the complex questions often have their own idea of what a good response would be. This has happened to me many times.

If someone who has available time could try to make some headway with building a PR for #1030 (allowing users to enter prompt+reply pairs), I think that would be the best answer to this problem.

TeYo001 commented 1 year ago

Well perhaps if we could add the complexity metric to the dataset, we could sort the data by complexity when actually training so that the AI would first learn basic prompt and only later in the training process get exposed to more complex answers. I could also potentially add an additional auto generated complexity metric in order to give a basic idea of complexity whithin the dataset.

King-Darius commented 1 year ago

The solution would be having people with specialist knowledge paired up to the topic of the prompts. I intentionally create very complex and difficult prompts so that the model has the best possible data and will be really useful. In a way Open Assistant could end up like a research assistant if we answer the harder prompts well.

I have also answered really complex prompts with a great deal of nuance in areas I have some knowledge about, because I want the final AI to be able to do the same.

King-Darius commented 1 year ago

Interesting point. I'm afraid the labels give people a preconception of what is and isn't a "good prompt", like it has to get full marks to be of highest quality. With that being said, the creativity-metric could encourage people trying to ask complex questions on purpose - I certainly tried making a few outlandish requests that required chain-of-thought and lateral thinking myself.

The trickiest part about adding a new label is that we already have just shy of 100k messages, so we either have to work with tons of unlabeled data, or retroactively add them all in some manner that can't be fully automated.

I think the idea is fair though - when we were discussing the definition of the Ordinary/Creative-spectrum, some people interpreted as complexity, so to some, this metric already exists. I'm all for clear definition of which part each label plays, so it can help.

At the very least, if the others are in agreement, we could change the guidelines to demand simple questions that can be answered easily.

I wouldn't go down that route. The best data will be from really tricky questions that have great answers. It would be much better if there was a system (yeah I know another feature request for an open source project) to match people with specialist knowledge to the difficult prompts (maybe even prompts that were rejected or aborted before too).

VisenDev commented 1 year ago

The solution would be having people with specialist knowledge paired up to the topic of the prompts. I intentionally create very complex and difficult prompts so that the model has the best possible data and will be really useful. In a way Open Assistant could end up like a research assistant if we answer the harder prompts well.

I have also answered really complex prompts with a great deal of nuance in areas I have some knowledge about, because I want the final AI to be able to do the same.

This. At the moment we are just paired with random questions. If I could answer questions in areas where I am educated, I would be able to create much better replies

LAION-AI / Open-Assistant

Most user generated prompts are too complex #1782