Bug: Inference is messed up in llama-server+default ui and llama-cli but works in llama-server+openweb ui

JMPSequeira commented 4 months ago

What happened?

Using: https://huggingface.co/bartowski/Hermes-2-Theta-Llama-3-8B-GGUF/blob/main/Hermes-2-Theta-Llama-3-8B-Q6_K.gguf

llama-cli

./llama-cli -m ~/data/models/Hermes-2-Theta-Llama-3-8B-Q6_K.gguf -ngl 99 -ts 1,1 -t 8 -c 4096 --interactive-first
Hello
=====                          

This is a small hello world program written in Java.

Compile                        
=======                        

To compile, simply run the following command:

    javac Hello.java

Run                            
===                            

To run the program, run the following command:

    java Hello                 

This will output:

    Hello, World! 

You can also run the program directly from the source code by using the following command:

    javac Hello.java && java Hello.java

this went on and on

llama-server + default ui

./llama-server -m ~/data/models/Hermes-2-Theta-Llama-3-8B-Q6_K.gguf -ngl 99 -ts 1,1 -t 8 -c 4096 --host 0.0.0.0 --port 8081

llama-server + openwebui from the same server instance

Name and Version

version: 3186 (ba589931)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

ngxson commented 4 months ago

Both web server default UI and main example does not have proper support for chat template.

For main example, here is the related work to support chat template: #8068

JMPSequeira commented 4 months ago

Thanks. I'll follow that or.

ggerganov / llama.cpp