ggerganov / llama.cpp

LLM inference in C/C++
MIT License
61.45k stars 8.79k forks source link

start to present code automatically in interactive mode. #942

Closed 4t8dd closed 3 months ago

4t8dd commented 1 year ago

I use the llama 7B model. I start it with

./main -m ./models/7B/ggml-model-q4_0.bin -n 128 -i

I can not get a chance to input. Screenshot 2023-04-13 at 4 40 07 PM

In non-interactive mode if I did not provide the prompt.

./main -m ./models/7B/ggml-model-q4_0.bin -n 128

It would display content too.

Is this a bug or I missed any?

CRD716 commented 1 year ago

Try --interactive-first added to the command. Check ./main -h for all of the flags.

4t8dd commented 1 year ago

I tried this option and got the same result. It will present content automatically when started. And why these two option, what is the difference here?

ANy way none works now.

ronfravi commented 10 months ago

I have the same problem on Mac M1 pro using metal build:

./build-metal/bin/main -t 8 -ngl 1 -m ./models/7B/ggml-model-q4_0.gguf --interactive-first --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1

Output:

image
KerfuffleV2 commented 10 months ago

I can't reproduce this. What commit are you using?

I'm also pretty sure the commandline you showed can't correspond to that output. Did you use other parameters like -ins?

tanmng commented 10 months ago

I got the same issue with latest clone of the repo.

Trying to add --interactive-first managed to get me a chance to input something, however, after I submit a prompt, the program prints out the response, and then just random code again.

KerfuffleV2 commented 10 months ago

Since you didn't say what prompt you used or anything, it's really hard to help you. There may be an issue with the prompt you used, or your expectations of the output might just be too high. It seems like you're using a 7B model, and small models aren't really all that smart.

tanmng commented 10 months ago

Thanks @KerfuffleV2 for responding. And sorry I didn't include a lot of details earlier.

I'm using the model downloaded from Huggingface by Thebloke.

The issue happens on commit 9912b9efc8922321fe7202ab42ba913833cbe9cd and also with the latest master

I tried multiple things:

./main -t 32 -m ../models/codellama-7b.Q4_K_M.gguf --color -i

The machine I'm using doesn't have a GPU but a lot of CPU so I just give the program 32 threads.

The output was:

Log start                                                                                                                                                                                                      
main: build = 1185 (9912b9e)                                                                                                                                                                                   
main: seed  = 1694363228                                                                                                                                                                                       
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from ../models/codellama-7b.Q4_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32016,     1,     1 ]                                                                                                    
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
...

..................................................................................................                                                                                                             
llama_new_context_with_model: kv self size  =  256.00 MB                                                                                                                                                       
llama_new_context_with_model: compute buffer total size =   72.00 MB                                                                                                                                           

system_info: n_threads = 32 / 48 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE
3 = 1 | VSX = 0 |                                                                                                                                                                                              
main: interactive mode on.                                                                                                                                                                                     
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, miro
stat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000                                                                                                                                                      
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0                                                                                                                                               

== Running in interactive mode. ==                                                                                                                                                                             
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 <PRE> <SUF> <MID> <?php
/**
 * Created by PhpSt

And the program just essentially print out code nonstop until I press ^C to stop it. But I don't have a chance to put in any prompt of input. Pressing ^C a second time will quit the program.

And got the same result (ie. program just keeps printing out code nonstop).

When I added --interactive-first, I did got a chance to submit a prompt, however, the program after printing out a response will just keep printing out more. code again.

I hope these are sufficient information for debugging, but please let me know if I should try anything else.

Cheers

KerfuffleV2 commented 10 months ago

I did got a chance to submit a prompt, however, the program after printing out a response will just keep printing out more. code again.

The model not doing what you want usually isn't a problem with llama.cpp. llama.cpp basically just "plays" the model, most of the model's behavior is determined by its training and the prompt you specify. The model may or may not follow instructions and produce coherent output.

Since you didn't say what the prompt you used was or anything it's very difficult to say if it's a problem with your prompt or not. You might be able to set a reverse prompt to return control after the output you want has been produced. It will kind of depend on the model.

Just for example, suppose a model takes a prompt format like:

USER: Do the the thing.
ASSISTANT: Blah blah, I'm doing the thing.

And then the model generates USER: and starts to interrogate itself (pretty common behavior since LLMs are just glorified text completion engines) then you could set a reverse prompt for USER:. Then when the model generates USER: you will get control back and be able to do whatever.

Also you should keep in mind 7B models are pretty small. They're very sensitive to prompting generally and they also aren't that smart. Even 70B models often have trouble following instructions and it can take experimenting with the prompt to get relevant output.

tanmng commented 10 months ago

Since you didn't say what the prompt you used was or anything it's very difficult to say if it's a problem with your prompt or not. You might be able to set a reverse prompt to return control after the output you want has been produced. It will kind of depend on the model.

Maybe I got the nomenclature incorrect or something, because you kept asking for the "prompt" that I used while I already said I couldn't submit a prompt to the program.

To clarify, I used the word "prompt" to mean a message that I send to the model (eg. in your example, "Do the thing." is that I would call a "prompt"). Which I hope the correct way to call it.

I didn't specify a -p flag in my command (ie. no prompt), and yet, the moment the command start up, it just start printing output from the model non-stop and don't give me control to do anything.

That's why in all my messages I kept saying I didn't have a chance to submit a prompt, the program just keep printing and I can't do anything (except pressing ^C twice to exit it).

The program was started in interactive mode (with -i flag), and it just prints output from the model nonstop:

== Running in interactive mode. ==                                                                                                                                                                             
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 <PRE> <SUF> <MID> <?php
/**
 * Created by PhpSt
 * 
KerfuffleV2 commented 10 months ago

To clarify, I used the word "prompt" to mean a message that I send to the model (eg. in your example, "Do the thing." is that I would call a "prompt"). Which I hope the correct way to call it.

Yes, that's correct.

That's why in all my messages I kept saying I didn't have a chance to submit a prompt

Yes, that's how it works. You also said:

When I added --interactive-first, I did got a chance to submit a prompt

If you want it to be in interactive mode first, then you need to use --interactive-first.

It seems like you already figured that part out. Then you said you didn't get the kind of output you wanted/expected. That's the part I was responding to.

I can't do anything (except pressing ^C twice to exit it).

If you have interactive mode turned on, pressing ^C once should return control to you (it might take a few seconds)).

tanmng commented 10 months ago

I see.

Thanks for the clarification, I forgot about the fact that I tried --interactive-first and was able to do 1 prompt before the program starts printing out non-stop.

Here's an example output from a session when I used --interactive-first (I added --in-prefix 'USER: ' to my command to help highlight which part is my input and which part is the program printing out the model output and manually added a line break below to help make things clearer)

USER: write python code to calculate 2 + 2
    >>> f(4, 7)
    11
    """
    return n1 + n2

if __name__ == '__main__':
    import doctest

    doctest.testmod()  # Will run all tests in docstring

# %% [markdown]
'''
### 4. Using `doctest` module to run our code

To do the above, we used the `do

USER: write python code to calculate 2^2
    >>> f(4, 7)
    11

**Note that you can't put def inside a docstring. You haveUSER:


I had to press `^C` to stop the program output.

In total, I submitted 2 prompts:
* `write python code to calculate 2 + 2`
* `write python code to calculate 2^2`

After responding to the first prompt, the program kept on printing until I pressed `^C` to halt it to have a chance for a second prompt.

The command I used to invoke the model was:

./main -t 10 -ngl 32 -m ../models/codellama-7b.Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -i --interactive-first --in-prefix 'USER: '


P/S: Trying a second time yield better result, but still the program will not terminate after printing out the response:

USER: write python code to calculate 2+2
```python
print(2+2)
```

hello world

write python code to print the text "hello world"

```python
print("Hello World!")
```

print integers

write python code to print two integers on one line.

```python
print(1, 2)
USER: write python code to calculate 2^2

```python
print(2^2)
```

print strings

write python code to print the text "hello world"

```python
print("Hello World!")
```

add integers

write python code to calculate

KerfuffleV2 commented 10 months ago

Thanks for the clarification, I forgot about the fact that I tried --interactive-first and was able to do 1 prompt before the program starts printing out non-stop.

Like I said before, llama.cpp basically just plays the model. Most models have an "end of text" token they can send but that's not up to llama.cpp. The model has to be trained to send it when appropriate and do that correctly.

Like I mentioned before, you can use reverse prompts to get interactive mode back when there are certain strings in the text. In this case, you could make the reverse prompt ### or something and that might help.

You said you were using CodeLlama 7B. This isn't an instruction tuned model: https://huggingface.co/codellama/CodeLlama-7b-hf/discussions/10

Instruct tuned models are trained for the question and answer format usually. Non-instruct tuned models may sometimes respond in a reasonable way to that format, but certainly not always. You'd probably have better luck with an instruction tuned model, and it seems like there actually is an instruct tuned CodeLlama 7B.

tanmng commented 10 months ago

Thanks @KerfuffleV2

In that case I'll try my luck with some other models. At first I was just worried somehow I invoked the utility incorrectly.

Cheers.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.