google-deepmind / recurrentgemma

Open weights language model from Google DeepMind, based on Griffin.
Apache License 2.0
567 stars 23 forks source link

[Question] Does recurrentgemma use the same tokenizer as gemma? #7

Closed Mooler0410 closed 2 weeks ago

Mooler0410 commented 3 weeks ago

In the paper, it's mentioned that:

Like Gemma, we use a subset of the SentencePiece tokenizer (Kudo and Richardson, 2018), with a vocabulary size of 256k tokens.

If not, how different are they? One is from another by extending the vocab or both are trained from scratch?

Thanks!

Nush395 commented 3 weeks ago

Hi, thanks for your question. That's correct RecurrentGemma uses the same tokenizer as Gemma.

botev commented 2 weeks ago

Closing this as the question has been answered.