On certain prompts, the LLM can spiral into an infinite loop providing the same item repeatedly, until stopped by max_tokens parameter.
In that case, the JSON will fail with an exception as being invalid, without returning any result.
Llama.cpp and MLX-LM have parameters to penalize repetition and thus preventing it.
While Outlines accept additional parameters to pass to Llama.cpp, it does not for MLX-LM,
resulting in prompt failure.
Describe the issue as clearly as possible:
On certain prompts, the LLM can spiral into an infinite loop providing the same item repeatedly, until stopped by max_tokens parameter.
In that case, the JSON will fail with an exception as being invalid, without returning any result.
Llama.cpp and MLX-LM have parameters to penalize repetition and thus preventing it. While Outlines accept additional parameters to pass to Llama.cpp, it does not for MLX-LM, resulting in prompt failure.
long_42k_llm_prompt.md
Steps/code to reproduce the bug:
Expected result:
Error message:
No response
Outlines/Python version information:
Version information
Context for the issue:
No response no