Closed Meshwa428 closed 2 months ago
Our understanding is that the gguf format is used by llama.cpp quantize
tool and our model isn't suitable in llama.cpp so I don't think this format will be useful to you for quantizing.
Thank you for your response, I know that it is not possible cause I tried it once, but I might look into the model let you know if I was able to quantize it. Because I want to use this model with ollama, that's why I was asking for gguf format
Anyways thank you sir
Sure. Note as per the discussion in https://huggingface.co/google/recurrentgemma-2b-it/discussions/6, we expect that the linear layers can be quantised but the recurrence might require some care.
We have seen that someone has produced a quantised version of the model but this wasn't done by us and we haven't evaluated their model or their approach but in case useful to you as a pointer - https://huggingface.co/PrunaAI/recurrentgemma-2b-it-bnb-4bit-smashed.
Is there any method to convert Griffin models to gguf? I want to quantize this model to q4_K type
Any kind of help is appreciated Thanks