This PR resync recent changes from llama.cpp's json_schema_to_grammar.py
Initially I came to this patch because I noticed that generated json strings sometimes contained ascii control characters, hence causing json.loads to fail.
Then I noticed that sometimes unterminated json was generated because llama.cpp was producing very long and wrong replies. I believe this is caused by a bug in grammar management, however using curly braces quantities instead of code-generated repetitions alleviates this problem a lot, and it gets totally fixed by providing max_length of string fields in the json schema.
Something else I noticed is that without this change the generated json could contain newlines and tabs/spaces as separators between json elements, whereas the grammar would impose a sigle whitespace: that should be another sign of a bug in grammar management. Perhaps teh grammar was silently ignored and a json was generated anyway because the json schema was part of the prompt as well.
This PR resync recent changes from llama.cpp's json_schema_to_grammar.py
Initially I came to this patch because I noticed that generated json strings sometimes contained ascii control characters, hence causing json.loads to fail. Then I noticed that sometimes unterminated json was generated because llama.cpp was producing very long and wrong replies. I believe this is caused by a bug in grammar management, however using curly braces quantities instead of code-generated repetitions alleviates this problem a lot, and it gets totally fixed by providing max_length of string fields in the json schema. Something else I noticed is that without this change the generated json could contain newlines and tabs/spaces as separators between json elements, whereas the grammar would impose a sigle whitespace: that should be another sign of a bug in grammar management. Perhaps teh grammar was silently ignored and a json was generated anyway because the json schema was part of the prompt as well.