Open melang982 opened 3 months ago
Thanks! Actually tokenizer is not a that "hard dependency" that applications built upon web-rwkv
would rely on -- they are always free to implement their own tokenizer. This is because web-rwkv
's model APIs only see tokens, not text strings. I will review this after #24 is merged.
Adding Huggingface tokenizer support. This is useful for RWKV models that were trained with a custom tokenizer, especially since RWKV tokenizer training code is not available. Useful for experiments such as per-character tokenizer or custom datasets such as music, timeseries, rare languages etc
Checked with ai00_server and my trained from scratch RWKV model that uses BBPE HF tokenizer - it works 🎉