Closed JMPSequeira closed 6 months ago
This is happening with the server, even with the older models that worked perfectly well before. If we revert to older releases, this issue is not faced.
The default server
UI does not work with instruct models because it uses the /completion
endpoint and it's own chat template - not the one of the model. Either use a base model, or a client that supports the /chat/completion
endpoint
Noted, thanks.
OS: Debian 12 Llama.cpp version: b2715 Model: Llama 3 - 8b Instruct
The model was converted from the hf Meta repo using
./convert.py ~/ai/hf-models/Llama-3-8b-Instruct/ --outfile ~/ai/unquantized/Llama-3-8b_fp16.gguf --vocab-type bpe --outtype f16
.Running
./server -m ~/ai/unquantized/Llama-3-8b_fp16.gguf -ngl 33 -ts 1,1 --host 0.0.0.0 --port 8080
I start getting unrelated tokens at the 2nd or 3rd generation. Here's an example:Sometimes it generates ad eternum:
...and it continued until
stop
.Here are my options: