abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.71k stars 928 forks source link

ERROR: Could not build wheels for llama-cpp-python #1617

Open inst32i opened 1 month ago

inst32i commented 1 month ago

Current Behavior

I run the following: CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

an error occured: ERROR: Failed building wheel for llama-cpp-python

Environment and Context

Failure Information (for bugs)

... FAILED: vendor/llama.cpp/examples/llava/llama-llava-cli : && /usr/bin/g++ -pthread -B /mnt/x_env/compiler_compat -O3 -DNDEBUG vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llama-llava-cli.dir/llava-cli.cpp.o -o vendor/llama.cpp/examples/llava/llama-llava-cli -Wl,-rpath,/tmp/tmp6bws6ysg/build/vendor/llama.cpp/src:/tmp/tmp6bws6ysg/build/vendor/llama.cpp/ggml/src: vendor/llama.cpp/common/libcommon.a vendor/llama.cpp/src/libllama.so vendor/llama.cpp/ggml/src/libggml.so && : /mnt/x_env/compiler_compat/ld: warning: libcuda.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libgomp.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libdl.so.2, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libpthread.so.0, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: librt.so.1, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemCreate' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference toGOMP_barrier@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressReserve' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuMemUnmap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_parallel@GOMP_4.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuMemSetAccess' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuDeviceGet' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference toomp_get_thread_num@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressFree' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuGetErrorString' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_single_start@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuDeviceGetAttribute' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemMap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuMemRelease' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to omp_get_num_threads@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuMemGetAllocationGranularity' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.

*** CMake build failed error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

Steps to Reproduce

  1. conda activate
  2. CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

Nvidia Driver Version: 550.54.14 Cuda Tookit Verion: V12.4.99

gillbates commented 1 month ago

same issue here ...

bteinstein commented 1 month ago

same issure here too

XingchenMengxiang commented 1 month ago

same issure here too

TobiasKlapper commented 1 month ago

Same here

SweetestRug commented 1 month ago

Same here as well.

bodybreaker commented 1 month ago

Same here too

bodybreaker commented 1 month ago

Current Behavior

I run the following: CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

an error occured: ERROR: Failed building wheel for llama-cpp-python

Environment and Context

  • Physical hardware: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: GenuineIntel Model name: Intel Xeon Processor (Skylake, IBRS) CPU family: 6 Model: 85 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 16 Stepping: 4 BogoMIPS: 4389.68 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopol ogy cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3d nowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt cl wb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke avx512_vnni md_clear Virtualization features: Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 512 KiB (16 instances) L1i: 512 KiB (16 instances) L2: 64 MiB (16 instances) L3: 256 MiB (16 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerabilities: Itlb multihit: KVM: Mitigation: VMX unsupported L1tf: Mitigation; PTE Inversion Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Meltdown: Mitigation; PTI Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Retbleed: Mitigation; IBRS Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected Srbds: Not affected Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown
  • Operating System: ubuntu1~22.04
  • SDK version:
$ python3 --3.11
$ make --4.3
$ g++ --11.4.0

Failure Information (for bugs)

... FAILED: vendor/llama.cpp/examples/llava/llama-llava-cli : && /usr/bin/g++ -pthread -B /mnt/x_env/compiler_compat -O3 -DNDEBUG vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llama-llava-cli.dir/llava-cli.cpp.o -o vendor/llama.cpp/examples/llava/llama-llava-cli -Wl,-rpath,/tmp/tmp6bws6ysg/build/vendor/llama.cpp/src:/tmp/tmp6bws6ysg/build/vendor/llama.cpp/ggml/src: vendor/llama.cpp/common/libcommon.a vendor/llama.cpp/src/libllama.so vendor/llama.cpp/ggml/src/libggml.so && : /mnt/x_env/compiler_compat/ld: warning: libcuda.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libgomp.so.1, needed by vendor/llama.cpp/ggml/src/libggml.so, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libdl.so.2, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: libpthread.so.0, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: warning: librt.so.1, needed by /usr/local/cuda-12.4/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link) /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemCreate' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference toGOMP_barrier@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressReserve' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuMemUnmap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_parallel@GOMP_4.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuMemSetAccess' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuDeviceGet' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference toomp_get_thread_num@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemAddressFree' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuGetErrorString' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to GOMP_single_start@GOMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuDeviceGetAttribute' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to cuMemMap' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuMemRelease' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference to omp_get_num_threads@OMP_1.0' /mnt/x_env/compiler_compat/ld: vendor/llama.cpp/ggml/src/libggml.so: undefined reference tocuMemGetAllocationGranularity' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.

*** CMake build failed error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

Steps to Reproduce

  1. conda activate
  2. CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose

Nvidia Driver Version: 550.54.14 Cuda Tookit Verion: V12.4.99

I solved this problem. This happend when Cuda version is different with Cuda toolkit version.

You need to check cuda-version with nvidia-smi

and check cuda-toolkit version wih conda list | grep cuda-toolkit

My version were 12.2 , 11.8

Viagounet commented 1 month ago

Same here. Installation worked fine with CMAKE_ARGS="-DLLAMA_CUBLAS=on" for llama-cpp-python <= 2.79.0. I now get the same error as OP for llama-cpp-python >= 2.80.0, whether I use CMAKE_ARGS="-DLLAMA_CUBLAS=on" or CMAKE_ARGS="-DGGML_CUDA=on"

hhhhpaaa commented 1 month ago

same issure here too in WSL2

gilbertc commented 1 month ago

same issue here too, WSL2 on Windows 10.

tigert1998 commented 3 weeks ago

same issue here

tigert1998 commented 3 weeks ago

I found a workaround to fix this issue:

  1. clone this project and check out the version you would like to install
  2. build this project with CMake
  3. then here comes the key part: overwrite pyproject.toml with the following content
# [build-system]
# requires = ["scikit-build-core[pyproject]>=0.9.2"]
# build-backend = "scikit_build_core.build"

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "llama_cpp_python"
dynamic = ["version"]
description = "Python bindings for the llama.cpp library"
readme = "README.md"
license = { text = "MIT" }
authors = [
    { name = "Andrei Betlen", email = "abetlen@gmail.com" },
]
dependencies = [
    "typing-extensions>=4.5.0",
    "numpy>=1.20.0",
    "diskcache>=5.6.1",
    "jinja2>=2.11.3",
]
requires-python = ">=3.8"
classifiers = [
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.8",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
]

[project.optional-dependencies]
server = [
    "uvicorn>=0.22.0",
    "fastapi>=0.100.0",
    "pydantic-settings>=2.0.1",
    "sse-starlette>=1.6.1",
    "starlette-context>=0.3.6,<0.4",
    "PyYAML>=5.1",
]
test = [
    "pytest>=7.4.0",
    "httpx>=0.24.1",
    "scipy>=1.10",
]
dev = [
    "black>=23.3.0",
    "twine>=4.0.2",
    "mkdocs>=1.4.3",
    "mkdocstrings[python]>=0.22.0",
    "mkdocs-material>=9.1.18",
    "pytest>=7.4.0",
    "httpx>=0.24.1",
]
all = [
    "llama_cpp_python[server,test,dev]",
]

# [tool.scikit-build]
# wheel.packages = ["llama_cpp"]
# cmake.verbose = true
# cmake.minimum-version = "3.21"
# minimum-version = "0.5.1"
# sdist.include = [".git", "vendor/llama.cpp/*"]

[tool.setuptools.packages.find]
include = ["llama_cpp"]

[tool.setuptools.package-data]
"llama_cpp" = ["lib/*"]

[tool.scikit-build.metadata.version]
provider = "scikit_build_core.metadata.regex"
input = "llama_cpp/__init__.py"

[project.urls]
Homepage = "https://github.com/abetlen/llama-cpp-python"
Issues = "https://github.com/abetlen/llama-cpp-python/issues"
Documentation = "https://llama-cpp-python.readthedocs.io/en/latest/"
Changelog = "https://llama-cpp-python.readthedocs.io/en/latest/changelog/"

[tool.pytest.ini_options]
testpaths = "tests"
  1. run pip install . --verbose
blkqi commented 2 weeks ago

Adding the path to libcuda.so to the LD_LIBRARY_PATH environment variable allows the examples to link so that the build can succeed.