Why fast tokenizer is disabled?

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

MIT License

3.91k stars 299 forks source link

Hi there, nice work on the internVL! We're really impressed by the new internvl-v1.5.

One thing we noticed is that the backing language model internlm/internlm2-chat-20b has a fast tokenizer (https://huggingface.co/internlm/internlm2-chat-20b/blob/main/tokenizer_config.json#L89). However, in internvl, the faster tokenizer was removed (https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5/blob/main/tokenizer_config.json#L162). We're wondering if there's any specific reason the faster tokenizer isn't enabled?

OpenGVLab / InternVL