Open Qubitium opened 6 days ago
Not only AutoModel
, the main block is check gptq lib here. Unless we change the check lib from auto-gptq
to gptqmodel
, it would be always false if we use gptqmodel
.
Also in quantizer_gptq
@jqing-feng Ok I see the chicken and egg here. Our integration only tested/patched model loading via optimum loading but the code you ref is actually hf transformer calling autogptq for model quantization.
Not only
AutoModel
, the main block is check gptq lib here. Unless we change the check lib fromauto-gptq
togptqmodel
, it would be always false if we usegptqmodel
.Also in quantizer_gptq
Hi @Qubitium . Sorry for misunderstanding your point, I will check the possibility. Thanks!
Please see optimum/gptq, it also use auto_gptq
lib, so we can only upstream in AutoGPTQ
. Intel CPU path want to keep the same usage as CUDA to make it more user-friendly. Thanks for your investigation. I think we can focus on how to upstream on AutoGPTQ
.
@Qubitium , do you have any plan to integrate gptqmodel into transformers, like what eetq and autogptq does? Thx.
@yao-matrix Yes. This is our goal but not the ultimate goal. Our primary goal is max model compat (new models), and quant model compat with vllm/sglang, plus quant speed and quant quality recovery. Api backward compat is not our primary goal right now. Once I feel like our api is stable, very soon, we will submit prs to Transformers/Optimum to replace autogptq as much as possible. There are many reasons AutoGPTQ is not getting proper updates and the problem will become worse and worse my view.
@yao-matrix Yes. This is our goal but not the ultimate goal. Our primary goal is max model compat (new models), and quant model compat with vllm/sglang, plus quant speed and quant quality recovery. Api backward compat is not our primary goal right now. Once I feel like our api is stable, very soon, we will submit prs to Transformers/Optimum to replace autogptq as much as possible. There are many reasons AutoGPTQ is not getting proper updates and the problem will become worse and worse my view.
Great, I will add the ipex feature into GPTQModel.
BTW, do you think we could finish the replacement in HF/optimum
at the end of this year? I would like to help with it. Thx!
BTW, do you think we could finish the replacement in
HF/optimum
at the end of this year? I would like to help with it. Thx!
That's great! We welcome contribution from anyone that is willing to improve this project. We are confident, once you started to work within the gpgtqmodel internals/framework, you will not want to switch back to autogptq for any reason. =)
Definitely we can by end of 2024. But we are also bound by the review process of these projects. We have had lm-eval PR for gptqmodel active with no response for like 3 months with no activity or feedback. So it really depends on how fast they react. https://github.com/EleutherAI/lm-evaluation-harness/pull/2217
@jiqing-feng For IPEX code, please add a small unit test in tests
. The CI is not automatic but we can trigger it manually on our 4090 action-hub instances when code is ready for verification. Every major feature/kernel will be CI tested/validated for regressions for future releases. We also plan to make sure every single model that we support has a CI test as well since regressions in Model quantization/inference is highly likely due to HF transformer and tokenizer updates.
Hi @Qubitium . Thanks for your support. I will let you review once the PR is ready. For lm-eval, I think you can fix the failed test(due to code style) and then let the maintainer review :)
@jiqing-feng I am going to answer gptqmodel specifics here.
When you mean
transformers
integration you meanAutoModel
loading of quantized models correct? Hf transformers moved all quantization code intooptimum
and we have the following integration code via monkey patch:https://github.com/ModelCloud/GPTQModel/blob/98dc26f04c70393e8da272a83450cf4f14790b79/tests/test_transformers_integration.py#L27
is this what you are looking for?
ref: https://github.com/AutoGPTQ/AutoGPTQ/pull/737#issuecomment-2415622872