Bugfix/attention mask and implementation

Issue: https://github.com/OpenGVLab/OmniQuant/issues/46

I had to change attn implementation initialization. Unexpectedly (at least for me :sweat_smile:), it turned out that it is not possible to specify attention in the model's config.json file. One can only set it as an argument when creating an object (attn_implementation is taken from "kwargs", not "config_dict", see: https://github.com/huggingface/transformers/blob/v4.36.1/src/transformers/configuration_utils.py#L772). So, I add attention to args and give it to config object when it is created.

I hope that this change is OK (new argument to parser + modified AutoConfig.from_pretrained call).

OpenGVLab / OmniQuant

Bugfix/attention mask and implementation #49