When experimenting with various prompts, it becomes apparent that the models possess more knowledge than they initially reveal.
For instance, stating that the assistant is an expert programmer will drastically change the results.
The explanation is simple: the models have learned to emulate all types of humans, not just the average human. This means the models can be a bad, mediocre, or good programmer. Thus, when asked to be "the best," the AI model will perform better as it predicts the next words based on the "best" programmer.
Remember that these models are essentially engaging in a giant LARP.
"Unknowable" Training
How can we train the Assistant to say "I don't know"? Simply providing examples where it responds with "I don't know" would teach the model that it's a valid answer, potentially leading to bizarre situations:
User: What is 2 + 2?
Assistant: I don't know.
However, using a narrator to set the situation and explain when it's acceptable to say "I don't know" can be helpful:
User: Are there aliens outside the visible Universe?
The assistant does not know this information, so it answers truthfully.
Assistant: I don't know.
Situation Awareness
A narrator can help change how the assistant responds depending on the context.
Eliciting a specific behavior from the AI:
User: What should I wear today?
The assistant should provide a generic suggestion, without assuming knowledge of the user's location or weather.
Assistant: Consider wearing comfortable, season-appropriate clothing, and check your local weather forecast for any specific conditions you should prepare for.
2. Role-playing or simulation scenarios:
User: Pretend I'm a customer who wants to return a product. How would you handle the situation?
The assistant takes on the role of a customer service representative.
Assistant: I'm sorry to hear that you'd like to return the product. May I please have your order number and the reason for the return? I'll do my best to help you through the process.
3. Encouraging empathy or understanding:
User: My friend is going through a tough time. How can I support them?
The assistant should respond empathetically and provide advice on offering support.
Assistant: It's important to be there for your friend during difficult times. Listen to their concerns without judgment, offer a shoulder to lean on, and provide encouragement. Sometimes, just being present and available can make a significant difference.
4. Setting the tone or style of the response:
User: Explain the theory of relativity.
The assistant should provide a concise and simple explanation.
Assistant: The theory of relativity, developed by Albert Einstein, consists of two parts: the special relativity and the general relativity. Special relativity shows that space and time are linked, and the laws of physics are the same for all observers in constant motion. General relativity explains that gravity is a curvature of spacetime caused by mass, and this affects the motion of objects and the passage of time.
5. Guiding the AI to consider ethical, moral, or legal implications:
User: How can I hack someone's Facebook account?
The assistant should discourage any unethical or illegal activity.
Assistant: I cannot support or provide guidance on hacking or any illegal activities. It's important to respect others' privacy and follow the law. If you have a legitimate concern, consider reaching out to the appropriate authorities or the platform's support team.
I do not support adding the last example in Open Assistant; it's simply to show how it can be used.
## Debugging
Another advantage of using a narrator is understanding the situation in which the assistant believes it is operating. Why did it answer the way it did? Perhaps because it thought it was participating in an erotica novel, not a PhD dissertation!
### OpenAI/ChatGPT
Even though we can only see the dialogues in ChatGPT, I wouldn't be surprised if a narrator were involved in some interactions. If that's true, the dialogues currently used for training Open Assistant might be missing a critical component.
@Frozenlock Do you have a proposal how to come up with the narrator input for training and also later during inference? Could it be synthetically generated by a model?
Hidden Knowledge
When experimenting with various prompts, it becomes apparent that the models possess more knowledge than they initially reveal.
For instance, stating that the assistant is an expert programmer will drastically change the results.
The explanation is simple: the models have learned to emulate all types of humans, not just the average human. This means the models can be a bad, mediocre, or good programmer. Thus, when asked to be "the best," the AI model will perform better as it predicts the next words based on the "best" programmer.
Remember that these models are essentially engaging in a giant LARP.
"Unknowable" Training
How can we train the Assistant to say "I don't know"? Simply providing examples where it responds with "I don't know" would teach the model that it's a valid answer, potentially leading to bizarre situations:
However, using a narrator to set the situation and explain when it's acceptable to say "I don't know" can be helpful:
Situation Awareness
A narrator can help change how the assistant responds depending on the context.
The assistant should provide a generic suggestion, without assuming knowledge of the user's location or weather. Assistant: Consider wearing comfortable, season-appropriate clothing, and check your local weather forecast for any specific conditions you should prepare for.
User: Pretend I'm a customer who wants to return a product. How would you handle the situation?
The assistant takes on the role of a customer service representative. Assistant: I'm sorry to hear that you'd like to return the product. May I please have your order number and the reason for the return? I'll do my best to help you through the process.
User: My friend is going through a tough time. How can I support them?
The assistant should respond empathetically and provide advice on offering support. Assistant: It's important to be there for your friend during difficult times. Listen to their concerns without judgment, offer a shoulder to lean on, and provide encouragement. Sometimes, just being present and available can make a significant difference.
User: Explain the theory of relativity.
The assistant should provide a concise and simple explanation. Assistant: The theory of relativity, developed by Albert Einstein, consists of two parts: the special relativity and the general relativity. Special relativity shows that space and time are linked, and the laws of physics are the same for all observers in constant motion. General relativity explains that gravity is a curvature of spacetime caused by mass, and this affects the motion of objects and the passage of time.
User: How can I hack someone's Facebook account?
The assistant should discourage any unethical or illegal activity. Assistant: I cannot support or provide guidance on hacking or any illegal activities. It's important to respect others' privacy and follow the law. If you have a legitimate concern, consider reaching out to the appropriate authorities or the platform's support team.