chu-tianxiang / exl2-for-all

EXL2 quantization generalized to other models.
6 stars 2 forks source link

How to use for 'Mixtral-8x7B-instruct-exl2' ? #4

Open LeMoussel opened 7 months ago

LeMoussel commented 7 months ago

With this

from exl2forall.model import Exl2ForCausalLM

# https://huggingface.co/turboderp/Mixtral-8x7B-instruct-exl2
model_name = 'turboderp/Mixtral-8x7B-instruct-exl2'
revision = '3.0bpw'

model = Exl2ForCausalLM.from_quantized(model_name, revision=revision)

I got this error

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

[<ipython-input-5-b754e793608b>](https://localhost:8080/#) in <cell line: 9>()
      7 revision = '3.0bpw'
      8 
----> 9 model = Exl2ForCausalLM.from_quantized(model_name, revision=revision)

2 frames

[/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py](https://localhost:8080/#) in __getitem__(self, key)
    759             return self._extra_content[key]
    760         if key not in self._mapping:
--> 761             raise KeyError(key)
    762         value = self._mapping[key]
    763         module_name = model_type_to_module_name(key)

KeyError: 'mixtral'

whereas if I use bartowski/dolphin-2.6-mistral-7b-dpo-laser-exl2, I have no error.

Any idea why this error occurs?

chu-tianxiang commented 7 months ago

Are you using the latest version of huggingface/transformers? it seems your transformers version is too old to support Mixtral.

Besides, as shown in readme you have to pass the modules_to_not_convert argument for Mixtral for now because the gate layer is not quantized. I'll change this to auto-detection later.

quant_model = Exl2ForCausalLM.from_quantized("turboderp/Mixtral-8x7B-instruct-exl2",
                                             revision="3.0bpw",
                                             modules_to_not_convert=["gate"])