Open ehartford opened 2 months ago
Hello, thank you for your interest in EETQ. The code you modified is for vllm, whose code is not merged yet, https://github.com/vllm-project/vllm/pull/3614, which make me confused about how you will use it. Could you please specify it? If you want to quantize Qwen2 with EETQ on transformers or TGI, I think you could directly use it under these two frameworks.
I am not using vllm. My change is not related to vllm.
I am trying to do this:
from eetq import AutoEETQForCausalLM
from transformers import AutoTokenizer
model_name = "/workspace/models/dolphin-2.9.2-qwen2-72b"
quant_path = "/workspace/models/dolphin-2.9.2-qwen2-72b-eetq"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoEETQForCausalLM.from_pretrained(model_name)
model.quantize(quant_path)
tokenizer.save_pretrained(quant_path)
The code changes I made here, enable this code to function. Without the code changes, I get the error that qwen2 is not supported.
qwen2 is not supported because, it is not in EETQ_CAUSAL_LM_MODEL_MAP it is not in EETQ_CAUSAL_LM_MODEL_MAP because there is no Qwen2EETQForCausalLM I implement that class.
I am not using vllm. My change is not related to vllm.
I am trying to do this:
from eetq import AutoEETQForCausalLM from transformers import AutoTokenizer model_name = "/workspace/models/dolphin-2.9.2-qwen2-72b" quant_path = "/workspace/models/dolphin-2.9.2-qwen2-72b-eetq" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoEETQForCausalLM.from_pretrained(model_name) model.quantize(quant_path) tokenizer.save_pretrained(quant_path)
The code changes I made here, enable this code to function. Without the code changes, I get the error that qwen2 is not supported.
qwen2 is not supported because, it is not in EETQ_CAUSAL_LM_MODEL_MAP it is not in EETQ_CAUSAL_LM_MODEL_MAP because there is no Qwen2EETQForCausalLM I implement that class.
If you want to use EETQ to quantize a model and inference in an existing inference framework like TGI, transformers or vllm, you have to customize the quantization for each framework because cutlass kernel will change the layout of quantized weight. The above code is customized for vllm (Sorry for unclear description in README). If you use it in other framework, it may output wrong tokens.
@ehartford AutoEETQForCausalLM
is developed for vllm framework, you can use eetq in transformers, like this:
from transformers import AutoModelForCausalLM, EetqConfig
path = "/workspace/models/dolphin-2.9.2-qwen2-72b"
quantization_config = EetqConfig("int8")
model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", quantization_config=quantization_config)
quant_path = "/workspace/models/dolphin-2.9.2-qwen2-72b-eetq"
model.save_pretrained(quant_path)
I wanna quantize my model to eetq format to publish it so people can download the eetq quantized version of my model. Just like they do with gptq, gguf, exl2, etc.
Please add Qwen2 support