Open yc-wang00 opened 4 days ago
I made an initial attempt that did not work. https://github.com/casper-hansen/AutoAWQ/compare/main...gemma2. Unfortunately, I do not have enough time at the moment to do further research on how to support the new architecture.
The biggest change I see for quantizing the model is that it now has a pre-feedforward and post-feedforward layernorm. So there is some challenge in trying to correctly quantize with AWQ. Maybe @TechxGenus or someone else can help contribute
There are still many issues (logits soft cap, fp16, sliding window e.g.) in gemma2 community support. I suggest waiting for them all to be resolved.
Hi team, I am opening this issue to request support for the Google Gemma 2 models.
Recently, Google released two models: google/gemma-2-27b and google/gemma-2-9b. For an initial trial, we attempted to use the existing Gemma path for these new models, but it didn't work as expected. Specifically, when I tried to quantize google/gemma-2-9b, the model just produce non-sense outputs.
Could someone please investigate and add support to gemma2?
Thank you very much!!!