safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.

Omkar118 commented 1 year ago

Hello , I am getting the following error when i run "python run_localGPT.py --device_type cpu"

2023-06-22 00:40:00,242 - INFO - run_localGPT.py:161 - Running on: cpu 2023-06-22 00:40:00,242 - INFO - run_localGPT.py:162 - Display Source Documents set to: False 2023-06-22 00:40:00,619 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large load INSTRUCTOR_Transformer max_seq_length 512 2023-06-22 00:40:04,307 - INFO - init.py:88 - Running Chroma using direct local API. 2023-06-22 00:40:04,323 - WARNING - init.py:43 - Using embedded DuckDB with persistence: data will be stored in: C:\Users\ACER\PycharmProjects\docgpt/D B 2023-06-22 00:40:04,339 - WARNING - ctypes.py:25 - Unable to connect optimized C data functions [No module named '_testbuffer'], falling back to pure Pytho n 2023-06-22 00:40:04,355 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings 2023-06-22 00:40:04,591 - INFO - duckdb.py:460 - loaded in 8 embeddings 2023-06-22 00:40:04,596 - INFO - duckdb.py:472 - loaded in 1 collections 2023-06-22 00:40:04,600 - INFO - duckdb.py:89 - collection with name langchain already exists, returning existing collection 2023-06-22 00:40:04,600 - INFO - run_localGPT.py:43 - Loading Model: TheBloke/WizardLM-7B-uncensored-GPTQ, on: cpu 2023-06-22 00:40:04,600 - INFO - run_localGPT.py:44 - This action can take a few minutes! 2023-06-22 00:40:04,600 - INFO - run_localGPT.py:49 - Using AutoGPTQForCausalLM for quantized models 2023-06-22 00:40:05,205 - INFO - run_localGPT.py:56 - Tokenizer loaded 2023-06-22 00:40:06,753 - INFO - _base.py:727 - lm_head not been quantized, will be ignored when make_quant. 2023-06-22 00:40:06,755 - WARNING - qlinear_old.py:16 - CUDA extension not installed. 2023-06-22 00:40:09,485 - WARNING - modeling.py:1035 - The model weights are not tied. Please use the tie_weights method before using the infer_auto_dev ice function. 2023-06-22 00:40:09,502 - WARNING - modeling.py:928 - The safetensors archive passed at C:\Users\ACER/.cache\huggingface\hub\models--TheBloke--WizardLM-7B- uncensored-GPTQ\snapshots\dcb3400039f15cff76b43a4921c59d47c5fc2252\WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors does not contain m etadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata.

and the program exits after this, what am i doing wrong.

i deleted the models from .cache folder it re downloaded everything but again the same issue. please let me know if any other info is needed

Adal73 commented 1 year ago

I got the same error message as below, eager to get some assistance from the owner. Thanks

The safetensors archive passed at C:\Users\Aston/.cache\huggingface\hub\models--TheBloke--WizardLM-7B-uncensored-GPTQ\snapshots\dcb3400039f15cff76b43a4921c59d47c5fc2252\WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata.

thonore75 commented 1 year ago

I have the same issue

teleprint-me commented 1 year ago

I was able to reproduce the issue. I'm looking into it.

# ...
CUDA SETUP: Loading binary /home/austin/.local/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_c
pu.so...
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` f
unction.
The safetensors archive passed at /home/austin/.cache/huggingface/hub/models--TheBloke--WizardLM-7B-V1.0-U
ncensored-GPTQ/snapshots/7060367aea53b1686be0c52962bc0405cfba7495/wizardlm-7b-v1.0-uncensored-GPTQ-4bit-12
8g.no-act.order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrai
ned` method. Defaulting to 'pt' metadata.

It's related to torch, bitsandbytes, and auto_gptq.

It defaults to forcing an nvidia environment and if it fails to match expected deps, it does this.

teleprint-me commented 1 year ago

I was able to narrow down the problem to bitsandbytes. You'll need to clone, build for CPU, and then install the build for CPU.

05:04:47 | ~/Documents/code/bitsandbytes-rocm
 git:(main | θ) λ make cpuonly CUDA_VERSION=CPU
which: no nvcc in (/home/austin/.bin:/home/austin/.local/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl)
ENVIRONMENT
============================
CUDA_VERSION: CPU
============================
NVCC path: /bin/nvcc
GPP path: /usr/bin/g++ VERSION: g++ (GCC) 13.1.1 20230429
CUDA_HOME: 
CONDA_PREFIX: 
PATH: /home/austin/.bin:/home/austin/.local/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
LD_LIBRARY_PATH: 
============================
/usr/bin/g++ -std=c++14 -shared -fPIC -I /home/austin/Documents/code/bitsandbytes-rocm/csrc -I /home/austin/Documents/code/bitsandbytes-rocm/include /home/austin/Documents/code/bitsandbytes-rocm/csrc/common.cpp /home/austin/Documents/code/bitsandbytes-rocm/csrc/cpu_ops.cpp /home/austin/Documents/code/bitsandbytes-rocm/csrc/pythonInterface.c -o ./bitsandbytes/libbitsandbytes_cpu.so

I'm going to test this out later today to verify. I ran into this issue when using auto_gptq and attempting to run one of TheBloke's GPTQ models. I use ROCm, not CUDA, it complained that CUDA wasn't available. This only happens with bitsandbytes. I can use other models with torch just fine. I'll just need to trick it into thinking CUDA is available.

The reason any of this matters is because the default pip build installs the Nvidia build.

Run the command

python -m bitsandbytes

Then post the output here before doing anything else.

Then try out what I recommend afterwards. You can find the instructions in the original repo.

Freylaverse commented 1 year ago

Run the command
python -m bitsandbytes
Then post the output here before doing anything else.

Hi! Same issue here.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so
False
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
  warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
  warn(msg)
CUDA SETUP: Loading binary C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
  File "C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 146, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
    from . import nn
  File "C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "C:\Users\Freyla\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError:
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

teleprint-me commented 1 year ago

bitsandbytes is a 8-Bit Optimizer process for Matrix Multiplication and was originally built for CUDA with nvcc. The python ctype bindings to the underlying binaries are compiled for Nvidia GPUs.

The package distributes only the Nvidia binaries and doesn't ship with CPU and doesn't officially support ROCm bindings. ROCm should work with some tweaks and modifications because of the way AMD setup their ABI.

The original maintainer of the ROCm port seems to be somewhat active. We'll need to make a note of this and come up with a solution for devs and users with different devices.

The bitsandbytes library is currently only supported on Linux distributions. Windows is not supported at the moment. The requirements can best be fulfilled by installing pytorch via anaconda. You can install PyTorch by following the "Get Started" instructions on the official website.

Source

mindwellsolutions commented 1 year ago

Thank you for being on top of this. Would be great to get this resolved. My script still works fine, but it seems it hits this error every time before defaulting to Nvidia CUDA.

PromtEngineer / localGPT

safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata. #167