Open dyang415 opened 1 week ago
We previously discovered that the tokenize results of FastTokenizer sometimes differed from those of Tokenizer. Considering that the benefits of FastTokenizer in our scenario are not significant, we decided not to use FastTokenizer to ensure the correctness of the code.
Hi there, nice work on the internVL! We're really impressed by the new internvl-v1.5.
One thing we noticed is that the backing language model internlm/internlm2-chat-20b has a fast tokenizer (https://huggingface.co/internlm/internlm2-chat-20b/blob/main/tokenizer_config.json#L89). However, in internvl, the faster tokenizer was removed (https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5/blob/main/tokenizer_config.json#L162). We're wondering if there's any specific reason the faster tokenizer isn't enabled?