Open xinyinan9527 opened 2 months ago
sorry,when use rwkv5-world model,i can not get the tokenizer.json
Don't need tokenizer.json
, the RWKV5Tokenzier
is custom implementation. the tokenizer.json
equals to https://github.com/BBuf/RWKV-World-HF-Tokenizer/blob/main/rwkv5_world_tokenizer/vocab.txt
.
Don't need
tokenizer.json
, theRWKV5Tokenzier
is custom implementation. thetokenizer.json
equals tohttps://github.com/BBuf/RWKV-World-HF-Tokenizer/blob/main/rwkv5_world_tokenizer/vocab.txt
.
yes,it works nice.But i want to use metapipe+rwkv5,it needs tokenizer.json. At rwkv4,we have "20B_tokenizer.json" and can get the file of "tokenizer.json" via Pretrainedtokenizerfast function.20B_tokenizer.json is created by BPE? Now,i am using a method convert "slow tokenizer" to "fast tokenizer" to get the rwkv5"tokenizer.json" file,it seems too hard. At here ,maybe it is your test file,https://github.com/huggingface/transformers/blob/0adb55f21c9c7d2b7ac92d17ab4a37655edce67f/out/tokenizer.json,it seems also from BEP. Do you have another method?thank you.
Don't need
tokenizer.json
, theRWKV5Tokenzier
is custom implementation. thetokenizer.json
equals tohttps://github.com/BBuf/RWKV-World-HF-Tokenizer/blob/main/rwkv5_world_tokenizer/vocab.txt
.yes,it works nice.But i want to use metapipe+rwkv5,it needs tokenizer.json. At rwkv4,we have "20B_tokenizer.json" and can get the file of "tokenizer.json" via Pretrainedtokenizerfast function.20B_tokenizer.json is created by BPE? Now,i am using a method convert "slow tokenizer" to "fast tokenizer" to get the rwkv5"tokenizer.json" file,it seems too hard. At here ,maybe it is your test file,https://github.com/huggingface/transformers/blob/0adb55f21c9c7d2b7ac92d17ab4a37655edce67f/out/tokenizer.json,it seems also from BEP. Do you have another method?thank you.
It's very diffucult, I had try use tokenizer.json
previous but the result would be wrong because vocab.txt
has differenct encoding style token.
I need to use tokenizer.json in my project, how should I create it?