The ggml model converted from "YeungNLP/bloomz-396m-zh" or "WangZeJun/bloom-396m-chat" lacks some tokens, such as the string "焙" or "擀", without corresponding tokens, the generated result cannot be displayed. However, in the official python way of the model, there is no such problem.
Sample, Notice the "�" section:
main: prompt: '面包的烘焙制作流程'
main: number of tokens in prompt = 3
24765 -> '面包'
373 -> '的'
28967 -> '烘'
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
面包(24765)的(373)烘(28967)�(1165)�(237)技巧(16012):(1038)
(189)1(20).(17) (210)面(1157)条(1996)要(853)煮熟(43916),(355)否则(14458)容易(7305)粘(14494)。(420)
(2813)2(21).(17) 应(23830)使用(2527)烤(15337)箱(8226)而不是(12285)微波(30656)炉(16613)加热(25228)面团(44449)。
(672)3(22).(17) 用(16647)冷水(33637)淋(15735)湿(10556)面团(44449)以防止(31473)黏(19639)在一起(10919)。
(672)4(23).(17) 在(3612)预(3119)热(4291)至(1546)摄氏(39868)175(13634)度(1423)时(1018)开始(3590)烘(28967)�(1165)�(237),(355)直到(8326)底部(26609)变得
(13044)金(1539)黄色(21313)并(1437)散(4711)发出(13801)香味(32740)即可(10134)享用(42892)</s>(2) [end of text]
main: mem per token = 4944640 bytes
main: load time = 558.57 ms
main: sample time = 516.50 ms
main: predict time = 3674.82 ms / 52.50 ms per token
main: total time = 4945.50 ms
The ggml model converted from "YeungNLP/bloomz-396m-zh" or "WangZeJun/bloom-396m-chat" lacks some tokens, such as the string "焙" or "擀", without corresponding tokens, the generated result cannot be displayed. However, in the official python way of the model, there is no such problem.
Sample, Notice the "�" section: