It seems like some important information has been left out of the documentation.
文档中似乎遗漏了一些重要信息。
Apparently some changes need to be made to the tokenizer_config.json file and the main.cpp files.
显然需要对 tokenizer_config.json 文件和 main.cpp 文件进行一些更改。
Before converting the model, the tokenizer_config.json file needs to be modified. Specifically "eos_token": "<|end_of_text|>"
should be changed to "eos_token": "<|end|>"
What about the "bos_token": "<|begin_of_text|>" ? Should it be changed to "bos_token": "<|start|>" Are there any other areas of the tokenizer that needs to be modified?
在转换模型之前,需要修改 tokenizer_config.json 文件。具体来说是“eos_token”:“<|end_of_text|>”
应更改为“eos_token”:“<|end|>”
那 "bos_token": "<|begin_of_text|>" 怎么样?是否应该更改为 "bos_token": "<|start|>" 分词器还有其他区域需要修改吗?
The rkllm-runtime/examples/rkllm_api_demo/src/main.cpp file also needs to be modified.
define PROMPT_TEXT_PREFIX "<|im_start|>system You are a helpful assistant. <|im_end|> <|im_start|>user"
If the tokenizer needs to be modified, should the readme not reflect this information? If the main.cpp is not correct, shouldn't that be fixed in the repo?
如果需要修改分词器,自述文件是否不应反映此信息?如果 main.cpp 不正确,是否应该在存储库中修复它?
It seems like some important information has been left out of the documentation. 文档中似乎遗漏了一些重要信息。
Apparently some changes need to be made to the tokenizer_config.json file and the main.cpp files. 显然需要对 tokenizer_config.json 文件和 main.cpp 文件进行一些更改。
Before converting the model, the tokenizer_config.json file needs to be modified. Specifically "eos_token": "<|end_of_text|>" should be changed to "eos_token": "<|end|>" What about the "bos_token": "<|begin_of_text|>" ? Should it be changed to "bos_token": "<|start|>" Are there any other areas of the tokenizer that needs to be modified? 在转换模型之前,需要修改 tokenizer_config.json 文件。具体来说是“eos_token”:“<|end_of_text|>” 应更改为“eos_token”:“<|end|>” 那 "bos_token": "<|begin_of_text|>" 怎么样?是否应该更改为 "bos_token": "<|start|>" 分词器还有其他区域需要修改吗?
The rkllm-runtime/examples/rkllm_api_demo/src/main.cpp file also needs to be modified.
define PROMPT_TEXT_PREFIX "<|im_start|>system You are a helpful assistant. <|im_end|> <|im_start|>user"
define PROMPT_TEXT_POSTFIX "<|im_end|><|im_start|>assistant"
should be changed to
define PROMPT_TEXT_PREFIX "<|user|>"
define PROMPT_TEXT_POSTFIX "<|end|><|assistant|>"
What about the system prompt?
rkllm-runtime/examples/rkllm_api_demo/src/main.cpp 文件也需要修改。
define PROMPT_TEXT_PREFIX "<|im_start|>system 你是一个有用的助手。<|im_end|> <|im_start|>user"
define PROMPT_TEXT_POSTFIX "<|im_end|><|im_start|>助理"
应该改为
define PROMPT_TEXT_PREFIX "<|用户|>"
define PROMPT_TEXT_POSTFIX "<|结束|><|助手|>"
系统提示怎么办?
If the tokenizer needs to be modified, should the readme not reflect this information? If the main.cpp is not correct, shouldn't that be fixed in the repo? 如果需要修改分词器,自述文件是否不应反映此信息?如果 main.cpp 不正确,是否应该在存储库中修复它?
There is a discussion going on Reddit about this topic. Reddit 上正在讨论这个话题。 https://www.reddit.com/r/RockchipNPU/comments/1cpngku/rknnllm_v101_lets_talk_about_converting_and/