Closed ZetangForward closed 9 months ago
I actually not yet tested it, but you can actually estimate it your self by letting the GPT-Neo tokenizer to tokenize, say about 500M data (does not take much time), and get the CHAR_TO_TOKEN_RATIO
I actually not yet tested it, but you can actually estimate it your self by letting the GPT-Neo tokenizer to tokenize, say about 500M data (does not take much time), and get the CHAR_TO_TOKEN_RATIO
ok, I will try, thx
Hi, thanks for your great work. I notice there is a ``LLAMA_CHAR_TO_TOKEN_RATIO'' hyper-parameter in your script. I want to test GPT-Neo with your script, can you provide the hyper-parameter of GPT_CHAR_TO_TOKEN_RATIO? Thx