Open morgen52 opened 1 week ago
Well, it doesn't have much context so the output is kind of expected.
You can be more specific in order to help the LLM understand what you want, for example:
curl \
--request POST --url http://localhost:8080/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "A pokemon character: name, skill, strength.\n\n","n_predict": 128}' | jq -r .content
{
"name": "Pikachu",
"skill": "Electric Attack",
"strength": 10
}
Also, for such kind of applications, you should not use instruction-tuned models as they are trained for a specific chat template. Instead, use a base model.
Thanks for your great help. After trying your suggestion, I got the same output as you showed. However, when I try to generate multiple Pokémon characters (e.g., 10), it only shows the first one. How can I receive a JSON response with all 10 characters at once? Thanks again for your assistance!
curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "Please generate 10 pokemon characters: name, skill, strength, height, weight.\n\n"}' | jq -r .content
{ "name": "Pikachu", "skill": "Electric", "strength": 60, "height": 0.5, "weight": 60 }
I'm confident this isn't a model issue because when I remove the -j option, the model consistently provides 10 outputs (as shown below), though they aren't in JSON format.
curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "Please generate 10 pokemon characters: name, skill, strength, height, weight.\n\n"}' | jq -r .content
Here are the 10 Pokémon characters:
1. **Name:** Embermoth
**Skill:** Fire-type
**Strength:** 85
**Height:** 3.5 feet
**Weight:** 22 pounds
2. **Name:** Aquaflame
**Skill:** Water-type
**Strength:** 90
**Height:** 4.2 feet
**Weight:** 30 pounds
3. **Name:** Thunderbolt
**Skill:** Electric-type
**Strength:** 95
**Height:** 4.8 feet
**Weight:** 40 pounds
(...)
You can improve your JSON schema to support array and lower the sampling temperature. For example:
-j "{\"type\":\"array\",\"items\":{}}"`
curl \
--request POST --url http://localhost:8080/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Please generate 3 pokemon characters: name, skill, strength, height, weight.\n\n", "temperature": 0.1}' | jq -r .content
[
{"name": "Pikachu", "skill": "Thunderbolt", "strength": 80, "height": 0.4, "weight": 6.0},
{"name": "Charizard", "skill": "Flamethrower", "strength": 130, "height": 1.7, "weight": 90.5},
{"name": "Squirtle", "skill": "Tackle", "strength": 48, "height": 0.5, "weight": 9.0}
]
Thank you again! I encountered another issue while following your steps this time. I used the following command to start my server
./llama.cpp-b3938/build_gpu/bin/llama-server -m ../models/Meta-Llama-3-8B-Instruct-Q4_0.gguf -ngl 30 -j "{\"type\":\"array\",\"items\":{}}"
And I sent a request using the command below and got an empty output []
.
curl \
--request POST --url http://localhost:8080/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Please generate 3 pokemon characters: name, skill, strength, height, weight.", "temperature": 0.1}' | jq -r .content
The only difference is that I removed the \n\n at the end of the prompt. I'm confused about why these two newlines are so important. What is the reason behind it?
To understand better what is going on, you have to think as if you are the LLM. You are asking it to complete the text: "Please generate 3 pokemon characters: name, skill, strength, height, weight."
, so it is very likely that the next text that will be generated will start with some whitespace (either a space " "
, or a new line \n
), because that's normally what text looks like after the end of a sentence. On the other hand, your are asking it to obey a grammar that requires the next character to be opening brace [
. So these are 2 conflicting requirements that cannot be satisfied at the same time and obviously the result will not be good.
By adding the new lines in the prompt, you help satisfy the first requirement and now it can continue generating according the JSON schema without conflict. You can alternatively add a space instead of new line:
"Please generate 3 pokemon characters: name, skill, strength, height, weight: "
Notice the space at the end. The logic is the same.
Interesting explanation. I understand now. Unfortunately, adding just one space doesn't work for me—I have to add at least two spaces to get a proper response, lol.
But I'm curious, can't this issue be fixed on the software side? From a user's perspective, we don't typically add multiple spaces or line breaks at the end of each prompt. And the fact that not doing so results in ineffective outputs is quite confusing.
But I'm curious, can't this issue be fixed on the software side? From a user's perspective, we don't typically add multiple spaces or line breaks at the end of each prompt.
It's not an issue - it works exactly as it is supposed to. The user's perspective should be fixed 😄
Haha, thank you for your reply.
What happened?
Hi! Thanks for your efforts for contributing such a great framework! I am working on deploying a custom service on my PC and learning to make llama.cpp produce structured output via -j config. However, when I use the -j {} config as a start point, I get meaningless output. I am not sure if I am doing something wrong or if there is a bug in the code. I would appreciate it if you could help me with this issue.
The command I used for starting the server:
The command I used for sending request:
The response I got:
Name and Version
./llama.cpp-b3938/build_gpu/bin/llama-server --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes version: 7 (d9a33c5) built with cc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output