QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
6.04k stars 340 forks source link

ollama部署Qwen2-72B-Instruct模型,出现token超出1024限制的提示 #711

Open lucumt opened 1 week ago

lucumt commented 1 week ago

我写了一个智能体,会把每轮的输出放到短时记忆里发送给大模型以便其下一轮的思考与推理,然而,在第三轮交互时,我遇到了一个超出1024个token限制的错误。

理论上,Qwen2-72B-Instruct模型支持的上下文长度可达128k token,这让我困惑是否是我在某处设置出了问题。我使用的是ollama部署的服务,并且在访问接口时,我已经将token长度的限制设置为了8912个token,如图所示 clipbord_1719456762856 报错信息如下: clipbord_1719456751990 请问如何解决这种超出限制的提示呢,谢谢

jklj077 commented 1 week ago

By ollama, what were you using? It was obivously not ollama-python. Please direct your issues to related parties and check if you were using the interface correctly.

FYI, Qwen2-72B-Instruct supports 32K in plain mode and only with proper setup and vllm, you can run it with 128K sequence.