Open Snowman-25 opened 2 months ago
Hi @Snowman-25 , thanks for the detailed writeup here. Interesting that the line cutoff is so consistent.
Nothing obvious comes to mind for me here, @sestinj , any thoughts?
Same issue I used gemini flash and pro models both got cutoff at a similar area.
You can try setting completionOptions.maxTokens in your config.json. We recently increased the default value for this, and I believe that the change is still in pre-release
You can try setting completionOptions.maxTokens in your config.json. We recently increased the default value for this, and I believe that the change is still in pre-release
yep this fixed it for me . I was previously adding "maxTokens": 4096, which wasn't working but now adding :
"completionOptions": { "maxTokens": 4096 },
fixes it for me .
That fixed it for me too. Note the plural in tokens.
Any specific reason why maxTokens isn't equal to the context length of the selected model? I may be confusing words here though, not an AI model expert
Before submitting your bug report
Relevant environment info
Description
While trying to convert a big Python2 file to Python3, output always stops after ~103 ± 3 lines. It sometimes cuts off in the middle of a variable or function name. The source-script is almost 600 lines. The Ollama-log doesn't show any errors. It behaves as if the answer is complete after 2-3 Minutes:
I'm not sure if this is a timeout-issue or token exhaustion or something else entirely. The Model has a context length of 163840 and an embedding length of 2048
When using starcoder2 (16384 context length and 3072 embedding length), I get ~150 lines.
To reproduce
Log output
No response