Open tpfau opened 9 months ago
+1 on this one
@tpfau I'm happy to work on a PR for this one - what do we think the ideal behavior is? If no max_tokens
is specified, to just not use a cap? or just use the n_ctx
context size as the max?
Possibly a duplicate of / related to #111 ?
Honestly: I think it doesn't matter what to use if nothing is specified. I'd probably say no limit since this would be the intuitive limit or use something like the current max token limit on OpenAI models or something similar.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
When a large query, comes in and max_tokens is not set but the model still fails with token limit exceeded this should be reflected in the error response.
Current Behavior
When the input tokens are too big, in the error handling another error happens, due to line 209 in app.py which combines the
None
ofcompletion_tokens
with theprompt_tokens
:completion_tokens + prompt_tokens
Essentially completion_tokens should be checked for None in the function or set a non None value for the max_tokens numberEnvironment and Context
Irrelevant
Irrelevant
Irrelevant
Irrelevant
Failure Information (for bugs)
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
Steps to Reproduce
Spin up a llamacpp server Send a request with more than ~2048 tokens without specifying a max_tokens parameter
Failure Logs