alasdairforsythe / tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
MIT License
551 stars 21 forks source link

RuntimeError: tokenmonsterserver: Cannot open or save vocabulary file, please check permissions #21

Closed abedkhooli closed 1 year ago

abedkhooli commented 1 year ago

What format should the vocab file be? I got the above error when I tried RWKV vocab which is a text file: https://raw.githubusercontent.com/BlinkDL/ChatRWKV/main/rwkv_pip_package/src/rwkv/rwkv_vocab_v20230424.txt
I gave 777 permission to the vocab.

alasdairforsythe commented 1 year ago

TokenMonster doesn't support this format. It supports either it's own TokenMonster .vocab format, or a YAML file, example here. You can write a simple script to convert your vocabulary into the YAML format and then import it using exportvocab executable, or the Go or Python libraries.

abedkhooli commented 1 year ago

Thanks. This is what I thought but the error msg was confusing. Is it possible to separate permission error from format error? Would close this issue then.

alasdairforsythe commented 1 year ago

It would be better for the error message to say exactly what the issue is, however the Python script doesn't know the exact error because the error is originating from the tokenmonsterserver subprocess. That's why you get this vague "cannot open or save" error message. I don't think this is important enough to update everything with new error codes. But I'll keep it in mind if I do a larger update at some point.

If you use the original Go implementation, instead of Python (which wraps the Go implementation), you'll get more detailed error messages.