google-deepmind / recurrentgemma

Open weights language model from Google DeepMind, based on Griffin.
Apache License 2.0
567 stars 23 forks source link

Any method for converting to gguf? #6

Closed Meshwa428 closed 2 months ago

Meshwa428 commented 2 months ago

Is there any method to convert Griffin models to gguf? I want to quantize this model to q4_K type

Any kind of help is appreciated Thanks

Nush395 commented 2 months ago

Our understanding is that the gguf format is used by llama.cpp quantize tool and our model isn't suitable in llama.cpp so I don't think this format will be useful to you for quantizing.

Meshwa428 commented 2 months ago

Thank you for your response, I know that it is not possible cause I tried it once, but I might look into the model let you know if I was able to quantize it. Because I want to use this model with ollama, that's why I was asking for gguf format

Anyways thank you sir

Nush395 commented 2 months ago

Sure. Note as per the discussion in https://huggingface.co/google/recurrentgemma-2b-it/discussions/6, we expect that the linear layers can be quantised but the recurrence might require some care.

We have seen that someone has produced a quantised version of the model but this wasn't done by us and we haven't evaluated their model or their approach but in case useful to you as a pointer - https://huggingface.co/PrunaAI/recurrentgemma-2b-it-bnb-4bit-smashed.