[BUG] <title>Adding regular tokens is not supported - Githubissues

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Apache License 2.0

13.59k stars 1.11k forks source link

[BUG] <title>Adding regular tokens is not supported #1289

Closed shilida closed 3 months ago

shilida commented 3 months ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

想在词表中添加新的token，结果报错Adding regular tokens is not supported

期望行为 | Expected Behavior

想在词表中添加新的token

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

jklj077 commented 3 months ago

As it says, it is not supported, because it is based on BPE and you cannot just add token to it. There is documentation on how to learn bpe merges in tokenization_note and you can add them as special tokens anyway. Do be warned that Qwen1.0 will not be updated anymore; upgrade to Qwen2.0.