dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.44k stars 479 forks source link

Infinite repetitions and invalid JSON - Outlines with MLX #1131

Open ea167 opened 2 months ago

ea167 commented 2 months ago

Describe the issue as clearly as possible:

On certain prompts, the LLM can spiral into an infinite loop providing the same item repeatedly, until stopped by max_tokens parameter.

In that case, the JSON will fail with an exception as being invalid, without returning any result.

Llama.cpp and MLX-LM have parameters to penalize repetition and thus preventing it. While Outlines accept additional parameters to pass to Llama.cpp, it does not for MLX-LM, resulting in prompt failure.

long_42k_llm_prompt.md

Steps/code to reproduce the bug:

RESULTS_JSON_SCHEMA = """{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
 "results": {
  "type": "array",
  "items": {
   "type": "string"
  }
 }
},
"required": ["results"],
"additionalProperties": false
}"""

from outlines import models, generate, samplers
import json

model = models.mlxlm("mlx-community/Meta-Llama-3.1-8B-Instruct-4bit")
sampler = samplers.multinomial( top_p=0.1 )
generator = generate.json( model, RESULTS_JSON_SCHEMA, sampler )

json_answer = generator( long_42k_llm_prompt, max_tokens=1000 )
print( json.dumps( json_answer, indent=4 ) )

Expected result:

List without endless repetition at the end.

When running directly MLX-LM, we get an infinite loop, stopped by max_tokens only

python -m mlx_lm.generate --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --prompt "$(< ~/Downloads/long_42k_llm_prompt.md)" --max-tokens 5000

...
687. **Methodist Hospital**
688. **Methodist Hospital**
689. **Methodist Hospital**
690. **Methodist Hospital**
691. **Methodist Hospital**
692. **Methodist Hospital**
693. **Methodist Hospital**
694. **Methodist Hospital**
695. **Methodist Hospital**
696. **Methodist Hospital**
697. **Methodist Hospital**

==========
Prompt: 11380 tokens, 432.382 tokens-per-sec
Generation: 5000 tokens, 26.872 tokens-per-sec
Peak memory: 6.891 GB

Error message:

No response

Outlines/Python version information:

Version information

0.0.47.dev69+g72377db
Python 3.12.4
mlx==0.17.2
mlx-lm==0.18.1

Context for the issue:

No response no

ea167 commented 2 months ago

I created the PR https://github.com/outlines-dev/outlines/pull/1134 to fix the problem.

Please review it and merge it.