[BUG/Help] <为什么同一个字符会对应多个token_id？（排除BPE分词处理）>

Is there an existing issue for this?

[ ] I have searched the existing issues

Current Behavior

我发现模型的tokenizer会把同一个token对应于至少两个token_id：例如B→30949和347、C→30942和319、D→30952和367。它们都是单个字母，应该不存在分词的问题。那么在使用tokenizer.decode将token转化为token_ids时，这样做不会出现混乱吗？

Expected Behavior

No response

Steps To Reproduce

无

Environment

无

Anything else?

No response

THUDM / ChatGLM2-6B

[BUG/Help] <为什么同一个字符会对应多个token_id？（排除BPE分词处理）> #598

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?