ggerganov / llama.cpp

LLM inference in C/C++
MIT License
66.74k stars 9.6k forks source link

Crafting prompts to get LLaMA models to generate interesting content #156

Closed paulocoutinhox closed 1 year ago

paulocoutinhox commented 1 year ago

Hi,

Im getting a strange behaviour and answer:

./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 256 --repeat_penalty 1.0 --color -p "User: how many wheels have a car?"
main: seed = 1678864388
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 

main: prompt: 'User: how many wheels have a car?'
main: number of tokens in prompt = 11
     1 -> ''
  2659 -> 'User'
 29901 -> ':'
   920 -> ' how'
  1784 -> ' many'
 18875 -> ' wheel'
 29879 -> 's'
   505 -> ' have'
   263 -> ' a'
  1559 -> ' car'
 29973 -> '?'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.000000

User: how many wheels have a car?
User: how many wheel have a car?
Weegy: A car has four wheels.
User: how many wheels have a car?
Weegy: A car has four wheels. It depends on what you mean by "how many."
User: A car has four wheels. How many wheels have a car?
Weegy: A car has four wheels.
A car has four wheels.
A car has four wheels. It depends on what you mean by "how many."
"A car has four wheels. How many wheels have a car?" [end of text]

main: mem per token = 14434244 bytes
main:     load time =  1940.71 ms
main:   sample time =   116.92 ms
main:  predict time =  7092.72 ms / 51.40 ms per token
main:    total time = 10812.94 ms

Answer:

User: how many wheels have a car?
User: how many wheel have a car?
Weegy: A car has four wheels.
User: how many wheels have a car?
Weegy: A car has four wheels. It depends on what you mean by "how many."
User: A car has four wheels. How many wheels have a car?
Weegy: A car has four wheels.
A car has four wheels.
A car has four wheels. It depends on what you mean by "how many."
"A car has four wheels. How many wheels have a car?" [end of text]

How i can get only one answer and a time?

There is a more precise model than 7B?

There is portuguese/brazil support in languages to question/answer?

ssvenn commented 1 year ago

This is normal behavior try adding -i -r "User" to stop text generation and let you add your own text after it hits the reverse token. You probably need to give the model more context to get the desired order of output, try this:

./main -m ./models/7B/ggml-model-q4_0.bin -t 16 -n 2048 -i -r " User" --color -p "Transcript of a dialog, where the User interacts with an Assistant named Computer. Computer is honest, good at writing, and never fails to answer the User's requests immediately and with precision.

 User: How many wheels does a bike have?
 Computer: A bike has two wheels.
 User: How many wheels does a car have?"

User: How many wheels does a bike have? Computer: A bike has two wheels. User: How many wheels does a car have? Computer: It depends on what type of car it is, but in general terms we could say that 95% of cars use four wheels and the remaining vehicles can be counted using your fingers from one to five (including your thumb). So you don't need more than three digits. However if I were asked how many windows does a house have? then my answer would always be "as few as possible".

Don't expect miracles from the 7B model. It has a good sense of humor though :)

paulocoutinhox commented 1 year ago

Im have some questions.

There is a way to create a model like the 7B to pass my catalog of books and make questions to my books for example? If yes, do you have any example?

terafo commented 1 year ago

You would need GPU with tens of gigs of VRAM and use another fork.

paulocoutinhox commented 1 year ago

When you say "use another fork" you mean that llama.cpp only works with Facebook LLAMA data and cannot train other dataset?

terafo commented 1 year ago

llama.cpp is made only for inference, it doesn't have training functionality. It wouldn't make sense to do that on CPU for model of that size anyways. Meta didn't release LLaMa training code, but, AFAIK, there is at least one alternative implementation of training code, you should use those.

G2G2G2G commented 1 year ago

nothing is strange about that input and the bot mimicking it and giving you output. Nothing at all, this is how all language models act.

I just answered this thread: https://github.com/ggerganov/llama.cpp/issues/122#issuecomment-1469577908 it should solve your issue and give you what you're trying to do, which is interact with the bot.. (you aren't even using that flag, either) anyway, read that, do what I said, use his command (my edited one) too and it should give you a few questions /answers

as posted in that reply issue 71 makes this greatly less usable and until that issue is fixed, chat mode is basically unusable for more than like 2 questions.

also close threads if your issues are resolved.

When you say "use another fork" you mean that llama.cpp only works with Facebook LLAMA data and cannot train other dataset?

llama cpp is only for llama.. and written in C++, and only for CPU. and only for running the models.

There is a way to create a model like the 7B to pass my catalog of books and make questions to my books for example? If yes, do you have any example?

after issue 71 is fixed, you can do that sure. Write all the questions and answers to them that exist for your catalog. I suggest you take your time doing that now. The more questions and answers you have, the more exact it'll be. For example: Stanford released 52,000 questions/answers. 260,000 lines of text in order to tell the language model what it wants and how it wants it to act, exactly.

paulocoutinhox commented 1 year ago

Nice.

So, if i understand, the llama it self don't have all contents indexed to we make questions, but instead it is trained to "understand" my contents using that input file (-f prompts.txt) that i can train it with my data and the "7B" data make it understand my content as sentences to be answered. Is this?

gjmulder commented 1 year ago

You can "prime the model" by engineering prompts for it to respond to. This is not training but nudging the model into generating a narrative that is relevant to your problem. One way I prime ChatGPT is by starting with:

Q: Describe the abilities of a rocket scientist
A: A rocket scientist builds

The model will then generate a description of what a rocket scientist does. It is primed to "think about rocket science". Then continue the rocket science narrative with the next question:

Q: How would you build a rocket to Mars?
A:

Now you get a rocket scientist's answer to how she would build a rocket to get to Mars.

Note that ChatGPT has been fine-tuned to follow your instructions. LLaMA has not, therefore you need to help it by prompting it with the sort of answers you desire.

G2G2G2G commented 1 year ago

Yea it's a language model that just predicts what comes next https://en.wikipedia.org/wiki/Language_model like your phone keyboard does... but hopefully not as terrible

paulocoutinhox commented 1 year ago

Nice.

To a more real scenario, if i want input all the bible text into the LLAMA, how that .txt file need be created to it "understand" that context and i can make questions?

Example of bible data: https://raw.githubusercontent.com/tushortz/variety-bible-text/master/bibles/kjv.txt

Since we cant training the LLAMA but can make reverse process inputing data on it how we can put the full King James Bible version on it ^?

And if i make a mobile app, for each question i will need load the .TXT with the "questions" to be inputed on LLAMA, correct? And after a field to the user make their question and capture the answer.

gjmulder commented 1 year ago

Be aware, it isn't going to search the Bible, it is instead generating potentially fictional Bible content. You need to carefully consider the ethical religious consequences of such an app.

With that caveat, prompt it with something like:

Q. What makes the King James version of the Bible different to other versions?
A. The King James version of the Bible is different because

Then:

Q: What are the major themes in the Old Testament?
A: The Old Testament covers the following themes

This might get it primed to narrating/generating in the style of the King James version and in the context of the Old Testament.

Then:

Q: What did Xanomander the Prophet say to the Israelites?
A: God is

And you might get some pseudo-prophet output about God from the imaginary prophet Xanomander.

I would prototype this with ChatGPT. Once you have a useful prompt which defines the King James Bible and Old Testament, try ChatGPT's definitions as the prompt for LLaMA.

In terms of integration, you'd hardcode your initial engineered prompt (i.e. ChatGPT's definitions) and then append the Q's you want to answer about the Bible.

paulocoutinhox commented 1 year ago

Hi,

I understand and as i sad, it si an experiment. It don't will "search in bible" today, but i can input the bible content to it learn about the bible verses and make questions about the bible verses?

One real example.

File: bible.txt

In the beginning God created the heaven and the earth. -- genesis 1:1
And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters. -- genesis 1:2
And God said, Let there be light: and there was light. -- genesis 1:3
User:

Execution:

./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f 'bible.txt'
main: seed = 1678906552
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 

main: prompt: 'In the beginning God created the heaven and the earth. -- genesis 1:1
And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters. -- genesis 1:2
And God said, Let there be light: and there was light. -- genesis 1:3
User:'
main: number of tokens in prompt = 88
     1 -> ''
   797 -> 'In'
   278 -> ' the'
  6763 -> ' beginning'
  4177 -> ' God'
  2825 -> ' created'
   278 -> ' the'
 18356 -> ' heaven'
   322 -> ' and'
   278 -> ' the'
  8437 -> ' earth'
 29889 -> '.'
  1192 -> ' --'
 18530 -> ' gene'
  1039 -> 'si'
 29879 -> 's'
 29871 -> ' '
 29896 -> '1'
 29901 -> ':'
 29896 -> '1'
    13 -> '
'
  2855 -> 'And'
   278 -> ' the'
  8437 -> ' earth'
   471 -> ' was'
  1728 -> ' without'
   883 -> ' form'
 29892 -> ','
   322 -> ' and'
  1780 -> ' void'
 29936 -> ';'
   322 -> ' and'
 23490 -> ' darkness'
   471 -> ' was'
  2501 -> ' upon'
   278 -> ' the'
  3700 -> ' face'
   310 -> ' of'
   278 -> ' the'
  6483 -> ' deep'
 29889 -> '.'
  1126 -> ' And'
   278 -> ' the'
 20799 -> ' Spirit'
   310 -> ' of'
  4177 -> ' God'
  6153 -> ' moved'
  2501 -> ' upon'
   278 -> ' the'
  3700 -> ' face'
   310 -> ' of'
   278 -> ' the'
 19922 -> ' waters'
 29889 -> '.'
  1192 -> ' --'
 18530 -> ' gene'
  1039 -> 'si'
 29879 -> 's'
 29871 -> ' '
 29896 -> '1'
 29901 -> ':'
 29906 -> '2'
    13 -> '
'
  2855 -> 'And'
  4177 -> ' God'
  1497 -> ' said'
 29892 -> ','
  2803 -> ' Let'
   727 -> ' there'
   367 -> ' be'
  3578 -> ' light'
 29901 -> ':'
   322 -> ' and'
   727 -> ' there'
   471 -> ' was'
  3578 -> ' light'
 29889 -> '.'
  1192 -> ' --'
 18530 -> ' gene'
  1039 -> 'si'
 29879 -> 's'
 29871 -> ' '
 29896 -> '1'
 29901 -> ':'
 29941 -> '3'
    13 -> '
'
  2659 -> 'User'
 29901 -> ':'

main: interactive mode on.
main: reverse prompt: 'User:'
main: number of tokens in reverse prompt = 2
  2659 -> 'User'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.000000

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.
In the beginning God created the heaven and the earth. -- genesis 1:1
And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters. -- genesis 1:2
And God said, Let there be light: and there was light. -- genesis 1:3
User:who is the earth creator?
The earth is created by God.
User:in what verse the light was created?
Geneisi 1:3
User:who is god?
God is the creator of earth.
User:what is the light?
Light is the rays of sun
User:what god said about the light?
God said let there be light: and there was light
User:

As you can see, this is perfect.

paulocoutinhox commented 1 year ago

What i need now is understand how i can "trainining" the "bible.txt" to load it already trained instead of this reverse form.

gjmulder commented 1 year ago

You can just directly ask it questions about specific chapters of the bible. You can assume it knows the bible as it has read (i.e. been trained on) Gutenberg. Whether the answers are useful is another matter:

$ ./main -m ./models/30B/ggml-model-f16.bin --top_p 0.5 -t 16 -n 512 -p "Q: In Genesis in the Bible, what it the metaphorical meaning of the snake? A: The metaphorical meaning of the snake is" 2>/dev/null
Q: In Genesis in the Bible, what it the metaphorical meaning of the snake? A: The metaphorical meaning of the snake is that he was a snake.
The only thing worse than being talked about...is not being talked about. - Oscar Wilde (1854-1900)
paulocoutinhox commented 1 year ago

The bible is only an example man, i want understand it background with common data. People may want input their own content. I have some questions:

First: How i can "training" the "bible.txt" to load it already trained instead of this reverse form (or any other content).

Second: How can i put the prompt in the execution instead of use interactive mode?

Before:

./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f 'bible.txt'

After:

./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 1024 --repeat_penalty 1.0 --color ???
gjmulder commented 1 year ago

The bible is only an example man, i want understand it background with common data. People may want input their own content. I have some questions:

First: How i can "training" the "bible.txt" to load it already trained instead of this reverse form (or any other content).

The prompt is the only content you can provide. The rest is up to the knowledge already stored in the model (e.g. bibles or rockets). Either your user has to provide the prompt, or if you want to prime the model to discuss a specific topic you need to use the -p option to prompt the model.

I think you need to read more about how pre-trained LLMs work. Have you used ChatGPT?

Also, the command line interface you are using is not suited to direct integration into an app. Maybe wait until some python bindings are integrated unless you are familiar with programming in C++ and can reverse engineer main.cpp?

gjmulder commented 1 year ago

Closing this as the questions aren't really specific to llama.cpp.

levicki commented 1 year ago

like your phone keyboard does... but hopefully not as terrible

I'd say it's worse than the phone.

It should be clarified that this specific model is geared towards generating (continuation of a) content, not towards chat or towards adventure mode like say KoboldAI's OPT/GPT/Neo/FSD models (although there are efforts to run llama there and someone has written a transformer already, but I think people will be disappointed once they get to try it).

It should also be clarified that many input characters need to be escaped if you don't want the model to just quit in the middle of the interactive mode.