THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
39.96k stars 5.15k forks source link

[BUG/Help] tokenizer collapse? #1446

Open CoinCheung opened 5 months ago

CoinCheung commented 5 months ago

Is there an existing issue for this?

Current Behavior

tokenizer collapse

Expected Behavior

No response

Steps To Reproduce

model_name = 'THUDM/chatglm3-6b-base' config = AutoConfig.from_pretrained(model_name, trust_remote_code=True, use_fast=False) tokenizer = AutoTokenizer.from_pretrained(model_name, config=config, trust_remote_code=True) print([tokenizer.decode(el) for el in [1833, 2893]])

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

On my platform, the output is same:

image