BBuf / RWKV-World-HF-Tokenizer

31 stars 5 forks source link

The tokenizer is 2.5x slower than other huggingface tokenizer and the original blinks world tokenizer #6

Open cahya-wirawan opened 1 month ago

cahya-wirawan commented 1 month ago

This tokenizer is 2.5x slower than other huggingface tokenizers and the original blinks world tokenizer. The comparison can be tested here: https://colab.research.google.com/gist/cahya-wirawan/932f95ece55c838e186dc3b1c9fcbef4/rwkv-tokenizers.ipynb

It generates also difference token ids for following edge cases: