ollama 加载 glm-4-9b-chat 胡言乱语 - Githubissues

THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Apache License 2.0

5.18k stars 429 forks source link

ollama 加载 glm-4-9b-chat 胡言乱语 #521

Closed siegrainwong closed 2 weeks ago

siegrainwong commented 2 months ago

System Info / 系統信息

cuda: 12.6 transformer: 4.44.0 OS: win10 python: 3.11.4 ollama: 0.3.8 & 0.2.3 配置: RTX3090 12700kf

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[ ] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

download gguf model from https://www.modelscope.cn/models/llm-research/glm-4-9b-chat-gguf/files
ollama create xxx
ollama serve & open open-webui

只要我不点停就会一直写下去，没在别的model上发现过这种情况（gemma2-7b\ yi-9b），根据以往记录下了0.2.3的ollama但响应差不多

Expected behavior / 期待表现

跑原模型时挺正常

zhipuch commented 2 months ago

https://github.com/THUDM/GLM-4/issues/323 https://github.com/THUDM/GLM-4/issues/333

siegrainwong commented 2 months ago

开过flash attention，不起作用