abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.12k stars 967 forks source link

Resync llama_grammar with llama.cpp implementation and use curly braces quantities instead of repetitions #1721

Open gbloisi-openaire opened 2 months ago

gbloisi-openaire commented 2 months ago

This PR resync recent changes from llama.cpp's json_schema_to_grammar.py

Initially I came to this patch because I noticed that generated json strings sometimes contained ascii control characters, hence causing json.loads to fail. Then I noticed that sometimes unterminated json was generated because llama.cpp was producing very long and wrong replies. I believe this is caused by a bug in grammar management, however using curly braces quantities instead of code-generated repetitions alleviates this problem a lot, and it gets totally fixed by providing max_length of string fields in the json schema. Something else I noticed is that without this change the generated json could contain newlines and tabs/spaces as separators between json elements, whereas the grammar would impose a sigle whitespace: that should be another sign of a bug in grammar management. Perhaps teh grammar was silently ignored and a json was generated anyway because the json schema was part of the prompt as well.