Closed Juelianqvq closed 2 months ago
@Juelianqvq The limitation is not accurate actually. infeatures
and outfeatures
are limited by the thread_k and thread_n in the qqq_gemm [https://github.com/HandH1998/QQQ/blob/main/csrc/qqq_gemm.cu#L880-L889](). You can modify it to more configs like what I do in vllm qqq_gemm https://github.com/vllm-project/vllm/blob/main/csrc/quantization/marlin/qqq/marlin_qqq_gemm_kernel.cu#L939-L957. Then infeatures
only needs to be difvisible by 64 and outfeatures
needs to be outfeatures by128. This will make Qwen2-72B work.
I will modify it these days. If you are in a hurry, you can try to modify it yourself and welcome to submit a PR to us.
Thanks for your explanation!
when packing it raises ValueError:
infeatures
must be divisible by 128 andoutfeatures
by 256.