abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.16k stars 970 forks source link

cdll_args["winmode"] = 0 breaks loading bundled CLBlast libs on Windows #563

Open jllllll opened 1 year ago

jllllll commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

Building llama-cpp-python with OpenCL CLBlast support (not CLBlast libs bundled with CUDA Toolkit) on Windows should work immediately without any additional steps.

Current Behavior

Loading llama.dll fails unless CLBlast libs are added to PATH. Removing cdll_args["winmode"] = 0 from llama_cpp.py (Source) allows llama.dll to successfully load using the CLBlast libs included in the package directory.

Environment and Context

i7-5820k GTX 1080ti

Windows 10 19045

Conda 23.1.0 Python 3.10.11 MSVC 19.36.32537.0 CMake 3.27.0

Steps to Reproduce

  1. Build and install: https://github.com/KhronosGroup/OpenCL-SDK.git -b v2023.04.17 https://github.com/CNugteren/CLBlast.git -b 1.6.1
  2. Use the following commands to build and install llama-cpp-python:
    set "CMAKE_PREFIX_PATH=\path\to\CLBlast\root"
    set "CMAKE_ARGS=-DLLAMA_CLBLAST=on"
    set FORCE_CMAKE=1
    set VERBOSE=1
    python -m pip install git+https://github.com/abetlen/llama-cpp-python --no-cache-dir -v

Failure Logs

Using text-generation-webui to load:

Traceback (most recent call last):
  File "G:\F\Projects\AI\text-generation-webui\one-click-installers-test\installer_files\env\lib\site-packages\llama_cpp\llama_cpp.py", line 67, in _load_shared_library
    return ctypes.CDLL(str(_lib_path), **cdll_args)
  File "G:\F\Projects\AI\text-generation-webui\one-click-installers-test\installer_files\env\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'G:\F\Projects\AI\text-generation-webui\one-click-installers-test\installer_files\env\Lib\site-packages\llama_cpp\llama.dll' (or one of its dependencies). Try using the full path with constructor syntax.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "G:\F\Projects\AI\text-generation-webui\one-click-installers-test\text-generation-webui\server.py", line 68, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "G:\F\Projects\AI\text-generation-webui\one-click-installers-test\text-generation-webui\modules\models.py", line 78, in load_model
    output = load_func_map[loader](model_name)
  File "G:\F\Projects\AI\text-generation-webui\one-click-installers-test\text-generation-webui\modules\models.py", line 232, in llamacpp_loader
    from modules.llamacpp_model import LlamaCppModel
  File "G:\F\Projects\AI\text-generation-webui\one-click-installers-test\text-generation-webui\modules\llamacpp_model.py", line 16, in <module>
    from llama_cpp import Llama, LlamaCache, LogitsProcessorList
  File "G:\F\Projects\AI\text-generation-webui\one-click-installers-test\installer_files\env\lib\site-packages\llama_cpp\__init__.py", line 1, in <module>
    from .llama_cpp import *
  File "G:\F\Projects\AI\text-generation-webui\one-click-installers-test\installer_files\env\lib\site-packages\llama_cpp\llama_cpp.py", line 80, in <module>
    _lib = _load_shared_library(_lib_base_name)
  File "G:\F\Projects\AI\text-generation-webui\one-click-installers-test\installer_files\env\lib\site-packages\llama_cpp\llama_cpp.py", line 69, in _load_shared_library
    raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}")
RuntimeError: Failed to load shared library 'G:\F\Projects\AI\text-generation-webui\one-click-installers-test\installer_files\env\Lib\site-packages\llama_cpp\llama.dll': Could not find module 'G:\F\Projects\AI\text-generation-webui\one-click-installers-test\installer_files\env\Lib\site-packages\llama_cpp\llama.dll' (or one of its dependencies). Try using the full path with constructor syntax.
abetlen commented 1 year ago

Hey @jllllll happy to help, sorry for the late reply here. Taking a look back at this the winmode=0 thing was added to fix another windows + cuda issue in #208

Would making it an environment variable that you can modify work for you?

jllllll commented 1 year ago

That's fine. Though, I don't think that adding winmode=0 ever fixed the original issue with CUDA. I think that this change is what actually fixed it: https://github.com/abetlen/llama-cpp-python/pull/225

As far as I can tell, winmode=0 only restricts the Windows library search paths. Not too sure the extent of what it does, but it seems to exclude libraries adjacent to the one you are trying to load. The docs for it's functionality are somewhat obtuse. According to the ctypes docs, this is the list of valid values for it: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa#parameters

The winmode parameter is used on Windows to specify how the library is loaded (since mode is ignored). It takes any value that is valid for the Win32 API LoadLibraryEx flags parameter. When omitted, the default is to use the flags that result in the most secure DLL load to avoiding issues such as DLL hijacking. Passing the full path to the DLL is the safest way to ensure the correct library and dependencies are loaded.

0 does not seem to be listed there as a valid value, yet it does not cause an error. I can only assume that it is falling back to some other functionality. The Python ctypes docs used to say that winmode=0 was necessary on Windows, but they seem to have removed that.

Honestly, I have very little idea of how winmode actually works. All I know is that removing it did not hinder any of my tests to load cuBLAS.

abetlen commented 1 year ago

You're right, I think I misread the docs there, does setting it to ctypes.RTLD_GLOBAL have any effect for you?

jllllll commented 1 year ago

ctypes.RTLD_GLOBAL seems to be set to 0 on Windows, so this produces the same behavior. I believe that is associated with the mode parameter, which isn't used on Windows.

Interestingly, after redoing my tests, the CLBlast libs are not included with the package data as they were in my initial tests. I can't figure out what I did differently at the time to get that to happen, so I'm just copying the CLBlast libs to the site-packages\llama_cpp directory for these tests.

My hope with all this is to allow CLBlast to work without any additional steps beyond simply installing llama-cpp-python. Though, I'm starting to think that this may be harder to achieve than I thought.

jllllll commented 1 year ago

Figured it out. Adding this to CMakeLists.txt results in the CLBlast libs being added to the package:

install(
    FILES $<TARGET_RUNTIME_DLLS:llama>
    DESTINATION llama_cpp
)

$<TARGET_RUNTIME_DLLS:llama> evaluates to an empty string on non-Windows systems. Haven't tested yet if this avoids modifying behavior on non-Windows.

Also doesn't seem to cause issues with cuBLAS. Doesn't include cuBLAS libs.

Biggest issue I've found so far is that this requires minimum CMake version of 3.21.

abetlen commented 1 year ago

@jllllll I think that's okay, cmake is available as a pip package and scikit-build-core should use that if the user's minimum version is below the minimum set in the pyproject.

I'll test on my system as well and in any case can just put it inside of a if(win32) block or similar