ModelCloud / GPTQModel

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Apache License 2.0
138 stars 29 forks source link

[Question] install issue #726

Open wenhuach21 opened 20 hours ago

wenhuach21 commented 20 hours ago

when pip install, marlin kernel could not find ValueError: Trying to use the marlin backend, but could not import the C++/CUDA dependencies with the following error: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/gptqmodel_marlin_cuda_inference.cpython-310-x86_64-linux-gnu.so)

when install from source

image
Qubitium commented 20 hours ago

@wenhuach21 It appears there are two issues.

  1. Pip install failed. Can you show the stacktrace for pip installed marlin error? It maybe caused by our cached whl prebuilt.

Need linux os version, kernel, libc/glibc version

  1. Source build error. Can you confirm which commit or release tag you are using for source install?

Thanks. @CSY-ModelCloud

CSY-ModelCloud commented 20 hours ago

We have renamed gptqmodel_marlin_cuda_inference. Can you try to pull latest and delete build dir? Then pip install it.

wenhuach21 commented 19 hours ago

Got it. It would be beneficial for GPTQModel to provide a backward-compatible API for layer packing and repacking, accommodating both the original AutoGPTQ linear layer and your/AutoRound fixed zero-point layer in future implementations. This would allow seamless reliance on your CUDA kernels for Marlin, asymmetric quantization, and other operations in AutoRound side.

Qubitium commented 19 hours ago

Got it. It would be beneficial for GPTQModel to provide a backward-compatible API for layer packing and repacking, accommodating both the original AutoGPTQ linear layer and your/AutoRound fixed zero-point layer in future implementations. This would allow seamless reliance on your CUDA kernels for Marlin, asymmetric quantization, and other operations in AutoRound side.

We are adding hf_select_quant_linear as external api for HF/optimum repo. Can autoeound use this? Api is going stable later today/tonight.

Tracking PR: https://github.com/ModelCloud/GPTQModel/pull/713

Code is not ready. We are finalizing it still. The above pr holds links to hf/optimum pr that will be submitted upstream.

Qubitium commented 15 hours ago

[1-3] https://github.com/ModelCloud/GPTQModel/pull/727/files

we will expose the 3 hf_ prefixed as stable api to hf/optimum. May still be changes. wip.

Correction: 4 hf_methods

[4] https://github.com/ModelCloud/GPTQModel/pull/728/files

wenhuach21 commented 15 hours ago

Thanks for the info. However, this may not help in our side, we need layer-wise packing and repacking as autoround could support mixed bits or mixed group size .