ModelCloud / GPTQModel

An easy-to-use LLM quantization and inference toolkit based on GPTQ algorithm (weight-only quantization).
Apache License 2.0
90 stars 19 forks source link

[BUG] Exception ignored on calling ctypes callback function: <function ThreadpoolController... #283

Closed davidgxue closed 1 month ago

davidgxue commented 1 month ago

Describe the bug

Quantization seems to have finished. So maybe this error can be ignore, but posting just to be safe so maybe folks here can verfiy.

GPU Info 1x A100 40GB image

Software Info

Operation System/Version + Python Version

Name: gptqmodel
Version: 0.9.9.dev0+cu1222
Summary: A LLM quantization package with user-friendly apis. Based on GPTQ algorithm.
Home-page: https://github.com/ModelCloud/GPTQModel
Author: ModelCloud
Author-email: 
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: accelerate, auto-round, datasets, gekko, intel-extension-for-transformers, ninja, numpy, optimum, packaging, protobuf, rouge, safetensors, sentencepiece, threadpoolctl, torch, tqdm, transformers, triton
Required-by: 
---
Name: torch
Version: 2.3.0+cu121
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, auto-round, auto_gptq, fastai, gptqmodel, optimum, peft, torchaudio, torchtext, torchvision
---
Name: transformers
Version: 4.41.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: auto-round, auto_gptq, gptqmodel, intel-extension-for-transformers, optimum, peft
---
Name: accelerate
Version: 0.33.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: zach.mueller@huggingface.co
License: Apache
Location: /usr/local/lib/python3.10/dist-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: auto-round, auto_gptq, gptqmodel, peft
---
Name: triton
Version: 2.3.0
Summary: A language and compiler for custom Deep Learning operations
Home-page: https://github.com/openai/triton/
Author: Philippe Tillet
Author-email: phil@openai.com
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock
Required-by: gptqmodel, torch

If you are reporting an inference bug of a post-quantized model, please post the content of config.json and quantize_config.json.

To Reproduce

Not sure if it's possible to consistently reproduce... I quantized a model merged with lora adapters. Specifically, it's a phi-3 mini 4k instruct finetuned with unsloth. I think you may want to consider this a mistral model instead since unsloth's phi-3 is mistralfied (in other words they changed the architecture of phi3 weights to be mistral based by splitting up the QKV layers).

Maybe attempt to quantize it with unsloth/Phi-3-mini-4k-instruct from huggingface, it should be the same result I imagine.

When quantizing... I got the following logs nears the end of quantization

...
INFO - {'layer': 32, 'module': 'mlp.gate_proj', 'avg_loss': '0.3261', 'time': '1.3555'}
INFO - {'layer': 32, 'module': 'mlp.down_proj', 'avg_loss': '3.1903', 'time': '3.6084'}
INFO - Packing model...
INFO:gptqmodel.utils.model:Packing model...
Exception ignored on calling ctypes callback function: <function ThreadpoolController._find_libraries_with_dl_iterate_phdr.<locals>.match_library_callback at 0x78c01c4e2950>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 1005, in match_library_callback
    self._make_controller_from_path(filepath)
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 1175, in _make_controller_from_path
    lib_controller = controller_class(
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 114, in __init__
    self.dynlib = ctypes.CDLL(filepath, mode=_RTLD_NOLOAD)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-5007b62f.3.23.dev.so: cannot open shared object file: No such file or directory
Packing model.layers.31.mlp.down_proj: 100%|██████████| 224/224 [03:12<00:00,  1.16it/s]
INFO - Model packed.
INFO:gptqmodel.utils.model:Model packed.
Exception ignored on calling ctypes callback function: <function ThreadpoolController._find_libraries_with_dl_iterate_phdr.<locals>.match_library_callback at 0x78c01c56c4c0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 1005, in match_library_callback
    self._make_controller_from_path(filepath)
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 1175, in _make_controller_from_path
    lib_controller = controller_class(
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 114, in __init__
    self.dynlib = ctypes.CDLL(filepath, mode=_RTLD_NOLOAD)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-5007b62f.3.23.dev.so: cannot open shared object file: No such file or directory
INFO - Compatibility: converting `checkpoint_format` from `gptq` to `gptq_v2`.
Exception ignored on calling ctypes callback function: <function ThreadpoolController._find_libraries_with_dl_iterate_phdr.<locals>.match_library_callback at 0x78c05921d990>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 1005, in match_library_callback
    self._make_controller_from_path(filepath)
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 1175, in _make_controller_from_path
    lib_controller = controller_class(
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 114, in __init__
    self.dynlib = ctypes.CDLL(filepath, mode=_RTLD_NOLOAD)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-5007b62f.3.23.dev.so: cannot open shared object file: No such file or directory
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1168: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(

Expected behavior

No Errors

Model/Datasets

Make sure your model/dataset is downloadable (on HF for example) so we can reproduce your issue.

Qubitium commented 1 month ago

@davidgxue Appears to be caused by threadpoolctl pkg trying to open/import openblas.

Please show the following:

  1. version of threadpoolctl installed
  2. version of openblas installed
Qubitium commented 1 month ago

Will close this issue. If still a bug or reproducible, please re-open.

davidgxue commented 1 month ago

Yeah sorry about the late response. I can't seem to reproduce it again. But even with the error above, the model seems to have quantized fine, so not sure if it's worth investigating. Will make a note if I run into this issue again. Thanks!