Qcompiler / vllm-mixed-precision

Support mixed-precsion inference with vllm
98 stars 19 forks source link

where is the mixlib? #1

Open Moran232 opened 3 weeks ago

Moran232 commented 3 weeks ago

Hi, I noticed that you use mixlib in your code https://github.com/Qcompiler/vllm-mixed-precision/blob/8a941fc4d19fe41e3cce433b40b0f15100d19f02/vllm/model_executor/layers/quantization/mixq4bit.py#L74

Could you tell me where it comes from? I want to test the 4bit gemm kernel.

Qcompiler commented 3 weeks ago

Dear [Moran232]: Please install the mixlib and the EETQ via: https://github.com/Qcompiler/QComplier

git clone git@github.com:Qcompiler/QComplier.git

cd EETQ python setup.py install

cd quantkernel python setup.py install

Thanks a lot~

Moran232 commented 1 week ago

Dear [Moran232]: Please install the mixlib and the EETQ via: https://github.com/Qcompiler/QComplier

git clone git@github.com:Qcompiler/QComplier.git

cd EETQ python setup.py install

cd quantkernel python setup.py install

Thanks a lot~

Tannks for your reply. I noticed that you only use int4 mixed gemm kernel when M>32 and use awq kernel when M<32. Can you explain the reason>

https://github.com/Qcompiler/vllm-mixed-precision/blob/8a941fc4d19fe41e3cce433b40b0f15100d19f02/vllm/model_executor/layers/quantization/mixq4bit.py#L244