LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with a KoboldAI UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.41k stars 319 forks source link

Missing First Character in Chinese Responses - Qwen1.5-32B Model #772

Open liuyunrui123 opened 3 months ago

liuyunrui123 commented 3 months ago
  1. When using the Qwen1.5-32B model in Chat Mode, the first character of Chinese responses is missing.
  2. Also, there's no proper termination at the end of replies, resulting in continuous output of whitespace until the maximum token limit is reached.
liuyunrui123 commented 3 months ago

Here's the link to the model: Qwen1.5-32B-Chat-GGUF

LostRuins commented 3 months ago

Hi, can you try the latest version?

liuyunrui123 commented 3 months ago

Hi, can you try the latest version?

I try the release 1.62.1 version and the last commit(branch: concedo, commit-id: df596aeef), this problem always occurs.

liuyunrui123 commented 3 months ago

Add some testing information:

  1. The backend terminal can see that the output content is normal, but the web interface displays the first character missing
  2. In advanced settting, my Token Streaming = SSE. This problem will occur in SSE mode, choosing other options will not

image image

LostRuins commented 3 months ago

image image

I just tried it with airoboros, works correctly for me with SSE streaming.

Then I tried chat mode with SSE, also ok image image

Does this only happen with Qwen1.5? have you tried a different one?

liuyunrui123 commented 3 months ago

image image

I just tried it with airoboros, works correctly for me with SSE streaming.

Then I tried chat mode with SSE, also ok image image

Does this only happen with Qwen1.5? have you tried a different one?

Yes, I try YI-34B, it's work , but Qwen1.5-32B work abnormal and this problem can occur 100%

LostRuins commented 3 months ago

I think maybe it is adding an illegal character before the "我" and that's why it is being trimmed off.

liuyunrui123 commented 3 months ago

I think maybe it is adding an illegal character before the "我" and that's why it is being trimmed off.

I agree with your point of view, I will further test it

liuyunrui123 commented 3 months ago

I added the following debugging and printing code after line 812 in koboldcpp.py:

utfprint("\nlyr**Output: " + recvtxt)
utfprint(f"\nlyr**{stream_flag = }")
utfprint(f"lyr**{api_format = }")
# 将字符串编码为字节对象
bytes_data = recvtxt.encode()
# 将字节对象转换为十六进制表示
hex_data = ' '.join(format(byte, '02X') for byte in bytes_data)
print("-------hex data------")
print(hex_data)

After multiple tests, I couldn't find any patterns, and the results are as follows:

FireofSilver commented 3 months ago

因为我英语不好,所以我写上两种语言。 通过群友的反映,应该是使用酒馆的时候,勾上了“高级格式化”中的包含名称选项。造成了qwen模型会吞字的情况。 包含名称的作用是在每次次回复前加上“{{char}}: ”,冒号后面有一个空格,所以模型每次输出的第一个字符都是空格,出现了吞字的情况。当我删除掉空格后,模型的输出就正常了。 所以这不是koboldcpp的问题,这是模型的问题,或者说是sillytavern的小bug。 作者辛苦了。 sillytavern的问题在于,它在每次回复前都加上了"{{char}}: ",唯独在最后一次指示输出的时候是“{{char}}:”,它丢失了一个空格,所以qwen认为空格非常重要以至于第一个先输出了空格。 Since I don't speak English well, I write in both languages. According to my friends, I think I have checked the "Advanced Formatting" option of "Include Name" when I use pubs. It causes the qwen model to swallow words. The function of include name is to add "{{char}}:" before each reply, there is a space after the colon, so the first character of the model output is a space every time, so there is a case of swallowing words. When I remove the space, the model output is normal. So it's not a problem with koboldcpp, it's a problem with the model, or a small bug in sillytavern. Thanks to the author for his hard work. @LostRuins The problem with sillytavern is that it adds "{{char}}: " before each response,the only time it instructs the output to be "{{char}}:" is the last time, and it's missing a space, so qwen thinks that the space is so important that the first one outputs the space first.

sssfhfhchasd commented 3 months ago

我不能确定漏字的原因是什么,但是它不仅仅和SillyTavern相关,即使是不使用SillyTavern,漏字现象仍然存在。在我的电脑上,我发现koboldcpp使用“chat mode”模式时有时会出现漏字现象,但并不是每次都漏字。我使用的是最新版本的koboldcpp,模型是“causallm_14b.IQ4_XS.gguf”。也许是qwen系列模型的问题,我开启了流式传输。 I'm not sure what the exact cause of the missing words is, but it's not only related to SillyTavern. Even without using SillyTavern, the phenomenon of missing words still occurs. On my computer, I've noticed that when using "chat mode" in koboldcpp, there are occasional instances of missing words, although it doesn't happen every time. I'm using the latest version of koboldcpp with the model "causallm_14b.IQ4_XS.gguf". Perhaps it's an issue with the qwen series models. I have the streaming option turned on. image image

LostRuins commented 3 months ago

As an alternative, maybe you can set the streaming to "poll" instead image

FireofSilver commented 3 months ago

看起来导致它们出现问题的原因都一样,这个部分也是前面“koboldAI: ”,有空格,而最后末尾的“koboldAI:”,没有空格,以至于模型要优先输出一个空格,覆盖了原本要输出的字。 @sssfhfhchasd It looks like the cause of the problem is the same for all of them, this part is also preceded by "koboldAI: ", which has a space, and ended with "koboldAI:", which has no space, so the model has to output a space in preference to override the the word that was supposed to be output. image

sssfhfhchasd commented 3 months ago

As an alternative, maybe you can set the streaming to "poll" instead作为替代方案,也许您可​​以将流式传输设置为“轮询” image

Yes, that's valid. Thank you very much!

sssfhfhchasd commented 3 months ago

看起来导致它们出现问题的原因都一样,这个部分也是前面“koboldAI: ”,有空格,而最后末尾的“koboldAI:”,没有空格,以至于模型要优先输出一个空格,覆盖了原本要输出的字。 @sssfhfhchasd It looks like the cause of the problem is the same for all of them, this part is also preceded by "koboldAI: ", which has a space, and ended with "koboldAI:", which has no space, so the model has to output a space in preference to override the the word that was supposed to be output. 看来问题的原因对于所有人来说都是相同的,这部分也以“koboldAI:”开头,其中有一个空格,并以“koboldAI:”结尾,其中没有空格,因此模型必须优先输出一个空格来覆盖应该输出的单词。 image

希望这个问题能被解决吧。

liuyunrui123 commented 3 months ago

我发现了一个新的问题,在Instruct Mode下中文对话,WEB界面中偶尔会出现生成好的内容后半部分内容消失,但在终端里可以看到完整的输出。 我的提示词是:谁创造了“疯狂英语”教学法

我尝试了设置 Token Streaming 为POLL或SSE方式,都会出现问题。 我猜测很有可能是web前端显示的bug。


I have discovered a new issue. During Chinese conversations in Instrument Mode, the generated content may occasionally disappear in the second half of the web interface, but the complete output can be seen in the terminal. My prompt is:谁创造了“疯狂英语”教学法

I tried setting Token Streaming to POLL or SSE mode, but there were still issues. I guess it's very likely a bug displayed on the web front-end.

2024-04-11-18-29-04

image

LostRuins commented 3 months ago

For that, try disabling Trim Sentences

image