Closed xu-song closed 1 week ago
tokens = icetk.encode('你好世界!这里是 icetk。') for token in tokens: print(token, icetk.text_tokenizer.proto.pieces[token - 20000].piece)
20005 ▁ 94874 你好 84097 世界 20035 ! 94947 这里是 22881 ▁ice 35955 tk 83823 。
what is "▁" used for?
what is "▁" used for?