QwenLM / Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Other
1.49k stars 107 forks source link

Tokenizer vocab size mismatch model vocab size #27

Open yangjiabupt opened 10 months ago

yangjiabupt commented 10 months ago

the vocab size in config it "vocab_size": 155947

However, the tokenizer vocab is only 155514

The redundant tokens is use for what?