Closed AllenShow closed 6 months ago
In Qwen(1.0), the text representation of special tokens can be freely customized. To make the necessary adjustments, please review the "Special tokens" section within the tokenization documentation found at https://github.com/QwenLM/Qwen/blob/main/tokenization_note.md#special-tokens. Additionally, it's crucial to examine the data preprocessing functions in finetune.py
and qwen_generation_utils.py
, since special tokens are handled differently from regular tokens.
在Qwen(1.0)中,特殊token的文字表示可以自由定制。若要进行必要的调整,请查阅tokenization文档中“Special tokens”部分(链接:https://github.com/QwenLM/Qwen/blob/main/tokenization_note.md#special-tokens)。此外,在finetune.py
和qwen_generation_utils.py
中数据预处理函数的实现至关重要,因为特殊token的处理方式与常规token有所不同。
您好!感谢您们团队出色的模型与文档! EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205))) <|extra_0|>~<|extra_204|>是用来存放额外special token的是吧,现在想基于千问模型finetune,是否可以将其中几个extra_的token改成自己想用的token?同时训练语料也用自己的token,这样可行吗? 需要修改的代码就是 tokenization_qwen.py 吧?还有别的地方要改吗?