ModelCloud / GPTQModel

GPTQ based LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Apache License 2.0
111 stars 26 forks source link

[FEATURE] Replace QBits to IPEX for cpu inference #450

Open jiqing-feng opened 4 days ago

jiqing-feng commented 4 days ago

Hi @Qubitium . As Qbits is no longer developed, we are considering to replace qbits by ipex in open-source project. AutoAWQ has finished the convert, see here, it could bring better usage and performance.

BTW, the setup.py can only support CUDA now, so I will add some parameters to enable CPU setup. Please let me know if you have any concerns. Thx!

Qubitium commented 4 days ago

Feel free to remove the qbits code if intel has stopped qbits devel and now concentrating on ipex.

On another note, is SYCL and IPEX also competing projects at Intel?

jiqing-feng commented 4 days ago

Feel free to remove the qbits code if intel has stopped qbits devel and now concentrating on ipex.

On another note, is SYCL and IPEX also competing projects at Intel?

They are compatible, SYCL mostly been used on Intel XPU which is our next step. I am currently focusing on CPU platform.

Qubitium commented 3 days ago

@jiqing-feng Heads up warning. We are doing quite a bit of cleanup on the codebase right now for 1.1 major release so base.py will be a little unstable until then but kernel level api should be stable.

jiqing-feng commented 3 days ago

@jiqing-feng Heads up warning. We are doing quite a bit of cleanup on the codebase right now for 1.1 major release so base.py will be a little unstable until then but kernel level api should be stable.

Sure, Thanks for remaindering. Can you give me an approximate time when it can be done?

Qubitium commented 3 days ago

@jiqing-feng We expect changes/refractor to be completed by Friday's end [Oct 25th].