OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
626 stars 49 forks source link

Bugfix/attention mask and implementation #49

Closed Alvant closed 6 months ago

Alvant commented 6 months ago

Issue: https://github.com/OpenGVLab/OmniQuant/issues/46

I had to change attn implementation initialization. Unexpectedly (at least for me :sweat_smile:), it turned out that it is not possible to specify attention in the model's config.json file. One can only set it as an argument when creating an object (attn_implementation is taken from "kwargs", not "config_dict", see: https://github.com/huggingface/transformers/blob/v4.36.1/src/transformers/configuration_utils.py#L772). So, I add attention to args and give it to config object when it is created.

I hope that this change is OK (new argument to parser + modified AutoConfig.from_pretrained call).

ChenMnZ commented 6 months ago

Good job! Thanks for your time again.