ModelCloud / GPTQModel

GPTQ based LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Apache License 2.0
95 stars 19 forks source link

Import error #64

Closed yaldashbz closed 3 months ago

yaldashbz commented 3 months ago

Hi,

I'm trying to install GPTQModel from the source, but I'm facing this error during import. Could you please specify the versions which are working for you? My cuda version is 11.7 (via nvcc --version), python 3.10, and also: pytorch 2.0.0, g++ 10.4.0

image

Thanks.

Qubitium commented 3 months ago

@yaldashbz Unit tests are done on python 3.10 + cuda 12.1 + pytorch 2.3.1 but should be compatible with anything torch 2.0.0 and up.

This is a very strange import error.

Did you get any strange errors or warnings during src compile pip install -v ./ --no-build-isolation?

FrederikHandberg commented 3 months ago

I had the same a few hours ago, I switched to yesterdays commit and it worked, seems like a new thing? (torch 2.1 cuda 11.8)

Qubitium commented 3 months ago

@FrederikHandberg You also experienced the exact libnvcrtc.so import error or slightly diff errors?

@CSY-ModelCloud We may have linking issue due to c++ compat with different python/cuda builds. We need a unit test env with cuda 11.8 paired with torch 2.1.0

FrederikHandberg commented 3 months ago

exactly the same ``` Traceback (most recent call last): File "/workspace/GPTQModel/fred_quant.py", line 2, in from gptqmodel import GPTQModel, QuantizeConfig File "/workspace/GPTQModel/gptqmodel/init.py", line 1, in from .models import GPTQModel File "/workspace/GPTQModel/gptqmodel/models/init.py", line 1, in from .auto import MODEL_MAP, GPTQModel File "/workspace/GPTQModel/gptqmodel/models/auto.py", line 4, in from .baichuan import BaiChuanGPTQ File "/workspace/GPTQModel/gptqmodel/models/baichuan.py", line 1, in from .base import BaseGPTQModel File "/workspace/GPTQModel/gptqmodel/models/base.py", line 23, in from ..utils.bitblas import convert_to_bitblas, prepare_model_for_bitblas_load File "/workspace/GPTQModel/gptqmodel/utils/bitblas.py", line 9, in from ..nn_modules.qlinear.qlinear_bitblas import QuantLinear as BitBLASQuantLinear File "/workspace/GPTQModel/gptqmodel/nn_modules/qlinear/qlinear_bitblas.py", line 15, in import bitblas File "/usr/local/lib/python3.10/dist-packages/bitblas/init.py", line 19, in from . import gpu # noqa: F401 File "/usr/local/lib/python3.10/dist-packages/bitblas/gpu/init.py", line 7, in from .fallback import Fallback # noqa: F401 File "/usr/local/lib/python3.10/dist-packages/bitblas/gpu/fallback.py", line 25, in from tvm import tir File "/usr/local/lib/python3.10/dist-packages/bitblas/3rdparty/tvm/python/tvm/init.py", line 26, in from ._ffi.base import TVMError, version, _RUNTIME_ONLY File "/usr/local/lib/python3.10/dist-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/init.py", line 28, in from .base import register_error File "/usr/local/lib/python3.10/dist-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/base.py", line 78, in _LIB, _LIB_NAME = _load_lib() File "/usr/local/lib/python3.10/dist-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/base.py", line 64, in _load_lib lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL) File "/usr/lib/python3.10/ctypes/init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: libnvrtc.so.12: cannot open shared object file: No such file or directory

yaldashbz commented 3 months ago

Did you get any strange errors or warnings during src compile pip install -v ./ --no-build-isolation?

@Qubitium No, there were no strange things, just some missed packages in my env. But I'm not sure about the warnings during src compile.

Qubitium commented 3 months ago

@yaldashbz @FrederikHandberg There are actually two issues here. One we fixed. Second is bitblas compat with Cuda < 12.1.

To resolve both errors, please:

  1. use env with cuda >= 12.1 (we will try to resolve this req in a future release)
  2. git clone/pull GPTQModel to latest tip/main and rebuild using pip intall -v ./ --no-build-isolation

Or if you can't upgrade to cuda >= 12.1, then you need to:

  1. Follow guide from bitblass to compile: https://github.com/microsoft/BitBLAS/blob/main/docs/Installation.md
  2. git clone/pull GPTQModel to latest tip/main and rebuild using pip intall -v ./ --no-build-isolation
Qubitium commented 3 months ago

@yaldashbz @FrederikHandberg v0.9.1 has been released with all our CI unit test passing. Please try it now and let us know. For env with cuda < 12.1 and with bitblas enabled in quantize_config, you will be prompted to manual src compile bitblas. But if you don't use bitblas kernel, then cuda < 12.1 should be fine (including the import errors) as long you have a gpu with cuda compute capability >= 6.0 support.

Qubitium commented 3 months ago

Closing this as resolved with 0.9.1 release. If the issue persist, feel free to re-open this issue.

yaldashbz commented 3 months ago

Great! Thank you so much