\ as given in the example tokenizes as plaintext as follows:
SPECIAL=1 MODE=tokenize lm "\"
35403,1648,3487,1873,35393
while appears to be the actual special token:
SPECIAL=1 MODE=tokenize lm ""
92299
The full set of special tokens seems OK with this change. and are missing in the tokenizer_config.json for both base and chat, but both base and chat return:
SPECIAL=1 MODE=tokenize lm ""
92295,92296,92297,92298,92299,92300
that is really something wrong, when i write the examples. the special token is <repo_name> not <reponame>. I will revise the examples right now. Thanks!
\ as given in the example tokenizes as plaintext as follows:
SPECIAL=1 MODE=tokenize lm "\"
35403,1648,3487,1873,35393
while appears to be the actual special token:
SPECIAL=1 MODE=tokenize lm ""
92299
The full set of special tokens seems OK with this change. and are missing in the tokenizer_config.json for both base and chat, but both base and chat return:
SPECIAL=1 MODE=tokenize lm ""
92295,92296,92297,92298,92299,92300
I am using llama.cpp ggufs for this test.