google-deepmind / recurrentgemma

Open weights language model from Google DeepMind, based on Griffin.
Apache License 2.0
597 stars 25 forks source link

Any plans on releasing a pretrained Hawk model? #2

Closed h-zhao1997 closed 5 months ago

h-zhao1997 commented 5 months ago

Thank you for releasing the language model based on the brand-new Griffin architecture! I am quite curious, do you have any plans to provide pre-trained weights for the Hawk model, the other pure RNN model mentioned in the article?

SamSmithGDM commented 5 months ago

Currently no, we don't have plans to release trained weights for Hawk models.

Note however, that it should be straightforward to create Hawk models (with random weights). Eg you can see an example of defining a model config here: https://github.com/google-deepmind/recurrentgemma/blob/e4939f9b7edf8baa1d512fb86bfc2e206044d66b/examples/simple_run_jax.py#L43

block_types is a tuple which lists the types of temporal-mixing blocks present in the model (of length = the model depth). To create a hawk model you should only use the "recurrentgemma.TemporalBlockType.RECURRENT" block type.