Closed cpfiffer closed 1 month ago
Thanks for reporting! I did a quick investigation.
A single-quote isn't generated, that's part of the error messages representation.
Looking at the response metadata, it's stop reason is "length" because SamplingParams.max_tokens == 16
.
In vllm, the default max_tokens
is 16 https://github.com/vllm-project/vllm/blob/main/vllm/sampling_params.py#L145
I'm wondering if we should default to the tokenizers max length specification if unspecified?
This fixed it! Thank you.
It's a weird default behavior, is there an easy way to use the tokenizer max length?
It's a weird default behavior, is there an easy way to use the tokenizer max length?
vLLM prevents generation past model_max_length
, so we can simply set max_tokens=None
by default.
It seems like it already defaults to max_tokens=None
, which strikes me as odd.
Right, but for the vLLM integration, sampling_params.max_tokens
isn't changed if max_tokens is None
https://github.com/dottxt-ai/outlines/blob/main/outlines/models/vllm.py#L96-L97
Describe the issue as clearly as possible:
Using vllm, I experienced an issue where outlines seems to be terminating the output early:
Suggestions/tips welcome!
Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
Version information
Context for the issue:
No response