Closed baptistejamin closed 1 year ago
Interesting point. I'm afraid the labels give people a preconception of what is and isn't a "good prompt", like it has to get full marks to be of highest quality. With that being said, the creativity-metric could encourage people trying to ask complex questions on purpose - I certainly tried making a few outlandish requests that required chain-of-thought and lateral thinking myself.
The trickiest part about adding a new label is that we already have just shy of 100k messages, so we either have to work with tons of unlabeled data, or retroactively add them all in some manner that can't be fully automated.
I think the idea is fair though - when we were discussing the definition of the Ordinary/Creative-spectrum, some people interpreted as complexity, so to some, this metric already exists. I'm all for clear definition of which part each label plays, so it can help.
At the very least, if the others are in agreement, we could change the guidelines to demand simple questions that can be answered easily.
If you feel a prompt is too complex or it feels a bit like asymmetric warfare by the prompter, then simply skip it. If enough people skip a prompt it'll be flagged by auto-moderation as a bad prompt, and it'll be aborted.
Is it a bad prompt though? Some complex prompts might be perfectly reasonable to ask an AI agent, but unreasonable to ask a human to spend time on.
I would prefer it if all prompts were complex, actually. If Open Assistant can't answer anything difficult, it's not going to be very helpful.
The people asking the complex questions often have their own idea of what a good response would be. This has happened to me many times.
If someone who has available time could try to make some headway with building a PR for #1030 (allowing users to enter prompt+reply pairs), I think that would be the best answer to this problem.
Well perhaps if we could add the complexity metric to the dataset, we could sort the data by complexity when actually training so that the AI would first learn basic prompt and only later in the training process get exposed to more complex answers. I could also potentially add an additional auto generated complexity metric in order to give a basic idea of complexity whithin the dataset.
The solution would be having people with specialist knowledge paired up to the topic of the prompts. I intentionally create very complex and difficult prompts so that the model has the best possible data and will be really useful. In a way Open Assistant could end up like a research assistant if we answer the harder prompts well.
I have also answered really complex prompts with a great deal of nuance in areas I have some knowledge about, because I want the final AI to be able to do the same.
Interesting point. I'm afraid the labels give people a preconception of what is and isn't a "good prompt", like it has to get full marks to be of highest quality. With that being said, the creativity-metric could encourage people trying to ask complex questions on purpose - I certainly tried making a few outlandish requests that required chain-of-thought and lateral thinking myself.
The trickiest part about adding a new label is that we already have just shy of 100k messages, so we either have to work with tons of unlabeled data, or retroactively add them all in some manner that can't be fully automated.
I think the idea is fair though - when we were discussing the definition of the Ordinary/Creative-spectrum, some people interpreted as complexity, so to some, this metric already exists. I'm all for clear definition of which part each label plays, so it can help.
At the very least, if the others are in agreement, we could change the guidelines to demand simple questions that can be answered easily.
I wouldn't go down that route. The best data will be from really tricky questions that have great answers. It would be much better if there was a system (yeah I know another feature request for an open source project) to match people with specialist knowledge to the difficult prompts (maybe even prompts that were rejected or aborted before too).
The solution would be having people with specialist knowledge paired up to the topic of the prompts. I intentionally create very complex and difficult prompts so that the model has the best possible data and will be really useful. In a way Open Assistant could end up like a research assistant if we answer the harder prompts well.
I have also answered really complex prompts with a great deal of nuance in areas I have some knowledge about, because I want the final AI to be able to do the same.
This. At the moment we are just paired with random questions. If I could answer questions in areas where I am educated, I would be able to create much better replies
Hey there.
Congratulations on such a project. This is super ambitious, and the outcome could be amazing.
The idea of this project at the end is to train an LLM with billions of parameters and praying after a few thousand or millions of results it will automatically work.
I am also working on LLMs for different projects, and my key understanding to fine-tune a model, dataset needs to be super, super, super qualified. This is the key.
We need to think as a function, as an algorithm (and it's why fine-tuning works so well with Codegen models!)
Most prompts I saw on the platform are cognitively complex for humans. Probably too complex in my humble-opinion.
The outcome is:
Think as a 10 years kid, asked to answer complex software engineering problem. You would'd understand anything what is going on.
There are a few solutions: