Closed ibehnam closed 5 months ago
@ibehnam Yes, I thought the problem is gone, but I gave it so much flexibility because otherwise weird problems arise, like the llm adding endless zeroes to a number, always writing float number even when give a integer number etc. I will try to write a grammar generator that defines only necessary whitespaces, and gives the llm the option to use a space or linebreak. Right now it is free to generate as much whitespaces it wants.
But that takes some time, I will try to do this. I think I will do it this month.
Thanks, I know it's a lot of work, and I'd appreciate it. Didn't know about the other weird problems that could arise without giving more flexibility to the LLM. The json.gbnf
example on llama.cpp also uses ws
, but in that example, two ws
s can't appear next to each other. It looks like the source of the problem is that llama-cpp-agent allows for that to happen.
@ibehnam Hi, can you take a look at your issue with the latest commit? I think I found the problem and fixed it.
@Maximilian-Winter Thanks a lot! Sure, I'll check it out as soon as I can and will update here.
@ibehnam Did you manage to check the new version?
@Maximilian-Winter Hi, yes I actually just tried it again. It's definitely gotten better than before, but sometimes even simple Pydantic classes like the following lead to errors with the latest llama.cpp server:
class Bio(BaseModel):
first_name: str = Field(default=..., description="The person's first name")
last_name: str = Field(default=..., description="The person's last name")
age: int = Field(default=..., description="The person's age")
I get bad request errors (10% of the time). I haven't been streaming the responses so I'm not sure if it's due to infinite generations of \n
/<space>
.
@ibehnam Thank you for the model, I could reproduce the issue and fixed it in the repo. The problem was an line break followed by whitespace
Thank you! I will also test it on a bit more advanced classes and update here if there's any issues.
I've been working with Mistral and Mixtral models and what I've noticed is that the grammar gives the models too much flexibility, which results in numerous cases where the LLM generates infinitely many spaces or new lines.
(I'm using your grammar example available in
llama.cpp
examples.)