Closed tmikaeld closed 2 months ago
Seems to be 1024, can't get it to go over that
Hi @tmikaeld 👋 Our model currently supports a maximum context length of 128k tokens. Larger context windows (higher max_token values) require significantly more GPU memory. In the provided examples, we've used a context length of 1024 tokens to make it easier for more people to quickly experience the capabilities of the model.
When i try it with ollama, the model doesnt output more than 1024. Do you know what can cause this?
This isn't very clear anywhere, but in Continue extension for VSCode, add:
{
"completionOptions": {
"maxTokens": 4096
}
}
On the model config parameters for Ollama.
What's the max output tokens the model can produce?