NetEase-FuXi / EETQ

Easy and Efficient Quantization for Transformers
Apache License 2.0
173 stars 14 forks source link

add Qwen2 #24

Open ehartford opened 2 months ago

ehartford commented 2 months ago

Please add Qwen2 support

EETQ_CAUSAL_LM_MODEL_MAP = {
    "llama": LlamaEETQForCausalLM,
    "baichuan": BaichuanEETQForCausalLM,
    "gemma": GemmaEETQForCausalLM
}
dtlzhuangz commented 2 months ago

Hello, thank you for your interest in EETQ. The code you modified is for vllm, whose code is not merged yet, https://github.com/vllm-project/vllm/pull/3614, which make me confused about how you will use it. Could you please specify it? If you want to quantize Qwen2 with EETQ on transformers or TGI, I think you could directly use it under these two frameworks.

ehartford commented 2 months ago

I am not using vllm. My change is not related to vllm.

I am trying to do this:

from eetq import AutoEETQForCausalLM
from transformers import AutoTokenizer

model_name = "/workspace/models/dolphin-2.9.2-qwen2-72b"
quant_path = "/workspace/models/dolphin-2.9.2-qwen2-72b-eetq"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoEETQForCausalLM.from_pretrained(model_name)
model.quantize(quant_path)
tokenizer.save_pretrained(quant_path)

The code changes I made here, enable this code to function. Without the code changes, I get the error that qwen2 is not supported.

qwen2 is not supported because, it is not in EETQ_CAUSAL_LM_MODEL_MAP it is not in EETQ_CAUSAL_LM_MODEL_MAP because there is no Qwen2EETQForCausalLM I implement that class.

ehartford commented 2 months ago

The output

https://huggingface.co/cognitivecomputations/dolphin-2.9.2-qwen2-72b-eetq

dtlzhuangz commented 2 months ago

I am not using vllm. My change is not related to vllm.

I am trying to do this:

from eetq import AutoEETQForCausalLM
from transformers import AutoTokenizer

model_name = "/workspace/models/dolphin-2.9.2-qwen2-72b"
quant_path = "/workspace/models/dolphin-2.9.2-qwen2-72b-eetq"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoEETQForCausalLM.from_pretrained(model_name)
model.quantize(quant_path)
tokenizer.save_pretrained(quant_path)

The code changes I made here, enable this code to function. Without the code changes, I get the error that qwen2 is not supported.

qwen2 is not supported because, it is not in EETQ_CAUSAL_LM_MODEL_MAP it is not in EETQ_CAUSAL_LM_MODEL_MAP because there is no Qwen2EETQForCausalLM I implement that class.

If you want to use EETQ to quantize a model and inference in an existing inference framework like TGI, transformers or vllm, you have to customize the quantization for each framework because cutlass kernel will change the layout of quantized weight. The above code is customized for vllm (Sorry for unclear description in README). If you use it in other framework, it may output wrong tokens.

SidaZh commented 2 months ago

@ehartford AutoEETQForCausalLM is developed for vllm framework, you can use eetq in transformers, like this:

from transformers import AutoModelForCausalLM, EetqConfig

path = "/workspace/models/dolphin-2.9.2-qwen2-72b"
quantization_config = EetqConfig("int8")
model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", quantization_config=quantization_config)
quant_path = "/workspace/models/dolphin-2.9.2-qwen2-72b-eetq"
model.save_pretrained(quant_path)
ehartford commented 2 months ago

I wanna quantize my model to eetq format to publish it so people can download the eetq quantized version of my model. Just like they do with gptq, gguf, exl2, etc.