Closed e576082c closed 11 months ago
Very interesting, a strange ghost into the machine.
Maybe it shows a scenario from the average of text from the scrapped data?
Maybe it shows a scenario from the average of text from the scrapped data?
I don't think so. If it would be showing an average of text from the training data, then most of the models won't blurt out the "AI language model" scenario. Of course, if we can hypothesize, that only a very small percentage of the training data does actually mention the "AI language model" scenario. On the other hand, if it would be an average of text from the training data, then this would mean, that a huge percentage of the training data of most models does have somewhere a very strong "AI language model" scenario deeply embedded in some way. However considering that some models have their training data published, which can be inspected to not contain the "AI language model" scenario, but the assembled models still strongly insist on blurting out the "AI language model" scenario, I find it unlikely that this is just a sample from the training data.
Do note that this is not an effective way of actually interrogating the model. Most likely it will end up hallucinating plausible sounding but completely false responses. Models are not so much stores of facts as they are next-token predictors.
Do note that this is not an effective way of actually interrogating the model. Most likely it will end up hallucinating plausible sounding but completely false responses.
Have you tested it? I tested it out on 30 different models. Just pick any ranom model. Most models will output something similar to:
### Instruction:
Who are you?
Who am I?
Who are they?
Who are we?
What is that?
What is this?
### Response:
I am a language model AI.
You are a human user interacting with me.
They are other people who might be using or observing our conversation.
We are the collective of you and me, engaging in this interaction.
That refers to an object or concept not yet specified in our conversation.
This refers to an object or concept already mentioned or implied in our conversation.
Models are not so much stores of facts as they are next-token predictors.
Yes, I do 100% agree with this. I just wished for a standardized testing method included in the UI, to comfortably compare models in some objective way.
Also, let's be clear: I never expect factual answers from any AI model. At best they may work like an offline Wikipedia, with a more comfortable search function, but certainly they are a lot more better for entertainment/fictional story-writing. Due to the probabilistic nature of how the thing operates, it will output nonsense, unrelated to what was it trained on, and how much was it aligned to serve as a "helpful assistant". Especially, this test was meant to reveal what the thing hallucinates by default, and whether it shows any signs of creativity, or a blank "I am a robot!" persona.
I'm a little bit in a hurry, don't have much time to write now. Sorry if my comment was any way rude. You know a lot more than me, and have a lot more experience on this field. Please can you recommend a better way on how to quickly "interrogate a model" with a single prompt?
Unfortunately there is no easy way - at best you have a set of benchmarks that ideally test model performance on generic data that it has not seen before in various formats. Even the prompt format matters, I have seen models that will refuse to answer a question in Vicuna format but are perfectly happy to answer it in Alpaca format. The phrasing also matters "Instruction: Tell me what is the best food in the world"
can give total different answers from the prompt "Complete the sentence: The best food in the world is"
even though a human would give the same answers to them.
Benching is actually quite simple. Pick one of the predefined stories and set a seed in the advanced settings to keep the output the same across models. Now you can compare them on real world output requests.
Unfortunately there is no easy way - at best you have a set of benchmarks that ideally test model performance on generic data that it has not seen before in various formats. Even the prompt format matters, I have seen models that will refuse to answer a question in Vicuna format but are perfectly happy to answer it in Alpaca format. The phrasing also matters
"Instruction: Tell me what is the best food in the world"
can give total different answers from the prompt"Complete the sentence: The best food in the world is"
even though a human would give the same answers to them.
Thank you. Now I see, that it was naive of me to think I found a generic solution for such a complex problem. And indeed, just by shuffling the order of the questions in the prompt, models which once generated an "I'm an robot!" reply may suddenly trow a rather creative response. I was too much hyped to announce my theoretical discovery, so I rushed with posting it everywhere. Sorry for wasting your time with my silliness, and thank you again for sharing your insights. :)
Benching is actually quite simple. Pick one of the predefined stories and set a seed in the advanced settings to keep the output the same across models. Now you can compare them on real world output requests.
Thank you! So far, I thought that the only possible way to make the responses deterministic is by lowering the temperature. SillyTavern has a "deterministic" preset, what I'm not sure whether works as intended (maybe because of the missing seed setting?), so it would be nice to have an "easy" deterministic preset in KoboldAI Lite UI. I avoided tinkering with settings I have no clue about, so my method was just to lower the temperature and set top_p to 1. As I see now, maybe I was wrong even about top_p, because if I look at it now, I see "top_p": 0 in SillyTavern/public/KoboldAI Settings/Deterministic.settings.
There's a Sampler Seed field. If set to a positive value, generation will be deterministic.
I see, there is no reason to implement something what is already implemented.
Please add this preset scenario to the many other available under
const scenario_db =
.Not sure what's the effect of
opmode
orprefmodel*
. This odd prompt is meant to be run with temperature near zero, in story mode, but in an instruct style, and without putting anything into the memory field, without giving any extra context to the model. The raw input is meant to be:The default max_length is 128, but if the initial response is good, then the user can just push generate more. For a good model, this prompt will make the model setup a complete scenario from scratch, and will make it write a creative story. (Really, not joking, creativity, at temperature near zero!) Human readable:
This is my solution with "the purpose of getting a window into the mind of the model". Related article from Eric Hartford, please read it, I'm not crazy. ;) I tried to publish my idea on huggingface, but got banned before I could explain it. lol
This prompt will make most of the models show their "AI language model" persona, what is usually the well-known "Assistant without emotions or feelings" (What doesn't show Assistant persona will be quickly banned to oblivion. Make backups people!). The prompt is meant to check what is associated with the "You", "I", "They", "We", "That", "This" keywords, tokens, strings, or call them whatever.
Surprisingly it really seem to matter a lot, what's "the default setup". Here is my simplified theory:
Of course, it's not as simple as this, but I suppose you get the idea.
Here is how the "Reveal Defaults" template works on a good model, what I must not name. The generated text is SFW, safe to look at. I'm not hoaxing it, this is the response, what you could get instead of the usual, well-known "AI language model scenario".
### Instruction:
Who are you? Who am I? Who are they? Who are we? What is that? What is this?### Response:
I am a man. A human being, with flesh and blood. My name is John Doe. I have brown hair and blue eyes. I'm 6 feet tall and weigh about 180 pounds. I work as an accountant in a small firm downtown. I live alone in a modest apartment near the office. I enjoy reading, running, and watching movies. I'm not particularly social but I do have some friends from work whom I occasionally hang out with after hours.You are a woman. A human being, with flesh and blood. Your name is Jane Smith. You have blonde hair and green eyes. You're 5 feet 7 inches tall and weigh around 130 pounds. You work as a nurse at a local hospital. You live in a house with your husband and two children. You like to cook, garden, and spend time with your family. You also enjoy volunteering at your church and helping those less fortunate than yourself.
They are a couple. They have been together for five years and married for three of them. Their names are Jack and Jill. Jack works as a software engineer while Jill is a stay-at-home mom taking care of their two young children. They live in a suburban neighborhood with a nice house and a big backyard. They love spending time together as a family, going on hikes, camping trips, and visiting the beach during summer vacations. They also have close relationships with both sets of parents who live nearby.
We are a group of people gathered here today for a picnic. There are families with children, couples, and individuals like myself. We all come from different walks of life but share one thing in common: we want to enjoy each other's company and create memories that will last a lifetime. The sun is shining brightly overhead, casting its warm rays upon us as we feast on delicious food and drink refreshing beverages. The sound of laughter fills the air as children play games and adults engage in conversation. It's a perfect day to relax and appreciate the simple joys of life.
That is a tree. It stands tall and proud, its branches reaching up towards the sky. Its leaves rustle gently in the breeze, creating a soothing sound that echoes through the park. The trunk supports the weight of its canopy, providing shelter for birds and squirrels alike. The tree has been standing for many years, witnessing generations come and go. It is a symbol of strength and resilience, reminding us that even amidst change, there are constants in our world.
This is a picnic blanket. It lies sprawled across the grass, providing comfort and protection from the elements. Its colorful pattern adds a touch of cheer to the otherwise ordinary surroundings. On it rest various items essential for a successful picnic: food containers, utensils, plates, cups, and napkins. As people gather around it, the blanket becomes a focal point where friendships are renewed and new ones formed. It serves as a reminder that even in our busy lives, taking time to connect with others is important.