Open lucumt opened 1 week ago
By ollama, what were you using? It was obivously not ollama-python. Please direct your issues to related parties and check if you were using the interface correctly.
FYI, Qwen2-72B-Instruct supports 32K in plain mode and only with proper setup and vllm, you can run it with 128K sequence.
我写了一个智能体,会把每轮的输出放到短时记忆里发送给大模型以便其下一轮的思考与推理,然而,在第三轮交互时,我遇到了一个超出1024个token限制的错误。
理论上,Qwen2-72B-Instruct模型支持的上下文长度可达128k token,这让我困惑是否是我在某处设置出了问题。我使用的是ollama部署的服务,并且在访问接口时,我已经将token长度的限制设置为了8912个token,如图所示
报错信息如下:
请问如何解决这种超出限制的提示呢,谢谢