Closed Mooler0410 closed 2 weeks ago
In the paper, it's mentioned that:
Like Gemma, we use a subset of the SentencePiece tokenizer (Kudo and Richardson, 2018), with a vocabulary size of 256k tokens.
If not, how different are they? One is from another by extending the vocab or both are trained from scratch?
Thanks!
Hi, thanks for your question. That's correct RecurrentGemma uses the same tokenizer as Gemma.
Closing this as the question has been answered.
In the paper, it's mentioned that:
If not, how different are they? One is from another by extending the vocab or both are trained from scratch?
Thanks!