BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.32k stars 838 forks source link

UTF-16 stream does not start with BOM #119

Closed ItsCRC closed 1 year ago

ItsCRC commented 1 year ago

Hi all,

I have trained RWKV-v4neo from scratch. After going through some issues, it seems that I need to execute run.py in RWKV-v4 to test my model. I changed the code as follows in run.py:


if TOKEN_MODE == 'char':
    MODEL_NAME = '/home/ubuntu/RWKV-v4neo/out/rwkv-450'
    WORD_NAME = '/home/ubuntu/RWKV-v4neo/out/vocab'

When executing python run.py, it gives UnicodeError: UTF-16 stream does not start with BOM

Any suggestions @BlinkDL ?

BlinkDL commented 1 year ago

hi resave vocab.json as UTF-8 and change util.py to load tokenizer as UTF-8