abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.92k stars 944 forks source link

Problems when i try to use this inside the default python 3.10 docker container #70

Closed LLukas22 closed 1 year ago

LLukas22 commented 1 year ago

When i try to install and use this package via a requirements file in the default 3.10 python container i get the following error when i try to import the module: Failed to load shared library '/usr/local/lib/python3.10/site-packages/llama_cpp/libllama.so': /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version 'GLIBCXX_3.4.29' not found (required by /usr/local/lib/python3.10/site-packages/llama_cpp/libllama.so)

Am i doing something wrong? Or am i just missing some dependencies?

abetlen commented 1 year ago

Do you mind posting the Dockerfile? I haven't tried to use this inside a container but I'l investigate.

LLukas22 commented 1 year ago

Alright i cant send you the Dockerfile, but i created a toy-example with your own server.

Dockerfile:

FROM python:3.10

#install
RUN pip3 install llama-cpp-python[server]

#Expose the ports
EXPOSE 8000

# Make a Dir for the models to be mounted into
RUN mkdir -p /var/lib/models

#Set environment 
ENV MODEL "/var/lib/models/ggjt-model.bin"

#Start server
CMD ["python3","-m","llama_cpp.server"]

I then started the container via this docker-compose file:

version: "3.9"
services:

  api:
    build:
      context: .
      dockerfile: Dockerfile
    image: llama-server
    container_name: llama-server

    ports:
      - "8000:8000"

    volumes:
      - [MODEL_DIR]:/var/lib/models

When running this the 'GLIBCXX_3.4.29' error is gone, but the model loads for a very long time and the container gets OOM killed sometimes. I'm running on 64GB so that's very strange. The other thing i noticed is that the container fails to bind the address but that's just docker IPv4 shenanigans. I will try to use llama-cpp-python[server] as a dependency in my other project and see if it will get rid of the 'GLIBCXX_3.4.29' error.

Oh and just for completness i was using this model.

abetlen commented 1 year ago

So the GLIBC error is fixed now? I would also make sure the OOM issue isn't some Docker default limit.

LLukas22 commented 1 year ago

Thats the strange thing the dockerfile listed above works without any problems. But if im trying to run my Dockerfile im getting the "GLIBC" error.

This is my Dockerfile:

FROM python:3.10

RUN apt-get update -y
RUN apt-get install -y python3-pip graphviz-dev git gcc-4.9 
RUN apt-get upgrade -y libstdc++6 

#Expose the ports
EXPOSE 8001

# Copy Files
RUN mkdir -p /app
ADD ./ /app/
WORKDIR /app

# Install Requirements
RUN --mount=type=cache,target=/root/.cache/pip pip3 install -r ./api/requirements-cpu.txt

RUN --mount=type=cache,target=/root/.cache/pip pip3 install git+https://github.com/huggingface/transformers@a17841ac4945631e4e13c072fa2a329b98ebb8b6 

ENV PYTHONPATH "${PYTHONPATH}:/app"

#Build cache dir for transformers
ENV HF_HOME "/huggingface/cache"

CMD ["python3", "/app/api/main.py"]

and its using the following requirements file:

protobuf==3.20.1
huggingface_hub
llama-cpp-python[server]
openai
pynvml
farm-haystack

#api dependencies
psutil
fastapi
uvicorn
dependency_injector

My guess is one of the dependencies in the requirements does some weird stuff, but i actually have no clue where to start looking.

jmtatsch commented 1 year ago

I run within a Ubuntu container which works. https://github.com/mkellerman/gpt4all-ui/ runs a 3.11 container and it works so I would guess the issue is not with llama-cpp-python but with your concrete dockerfile/requirements.

jmtatsch commented 1 year ago

Took the opportunity to shrink my own dockerfile:

FROM python:3.10

COPY .devops/requirements.txt requirements.txt
RUN pip install -r requirements.txt && rm -rf requirements.txt

ENTRYPOINT [ "python3", "-m", "llama_cpp.server" ]

works without a hitch with requirements.txt

llama-cpp-python
uvicorn
fastapi
sse_starlette
typing_extensions
LLukas22 commented 1 year ago

I also played around a bit more and i couldnt get the container working, and i dont know why. All the other containers i build with llama-cpp-python work without any problems. I will probably host a seperate llama-cpp-python container and then use it via its REST api.

@abetlen Could we maybe get an official prebuild docker container for the REST server?

abetlen commented 1 year ago

Yeah if someone wants to open a PR I can test / review it when I have some time.

LLukas22 commented 1 year ago

I could try later. But i guess i noticed another problem i had, when building my containers via github-actions and then trying to run them locally the containers exit with exitcode 132 hinting at an unsupported CPU instruction-set. If build and run locally they work without any problems.

My guess the is the llama.cpp dependencies are resolved in the build step of the container with the cpu feature flags of the github-actions-runner. Can i define specific feature flags used for the llama.cpp compilation at setup time or should i look into first building llama.cpp manually and then setting it via the environment?

abetlen commented 1 year ago

@LLukas22 that's definitely it, should check what llama.cpp does for this with their docker containers.

LLukas22 commented 1 year ago

@abetlen Yeah its pretty weird, i played around a bit but can't get it to work even if i use QEMU to force a linux/amd64 platform while building the image on an github-actions-runner. And i'm actually really confused what instruction set is missing as i'm using relatively new processors (AMD Zen 2 & 3) so avx and avx-2 are definitely there.

But i'm also no docker pro, so maybe i'm just missing something.

abetlen commented 1 year ago

@LLukas22 is QEMU sufficient to emulate processor specific instruction sets? I'd imagine it's because of a mismatch between processor architectures in the github runner and your local machine.

LLukas22 commented 1 year ago

@abetlen Theoretically QEMU should be able to emulate any cpu features, but i dont know how it is implemented in docker build. They probably create a vm with the cpu flags host or max which will use all features of the host system. Then the problem with the architecture mismatch happens when i try to run it locally. But i'm still a bit confused thought nearly all x64 cpus released in the last <5 years implement the same instructionsets 🤔

abetlen commented 1 year ago

@LLukas22 https://github.com/ggerganov/llama.cpp/blob/master/.github/workflows/docker.yml describes how they build the llama.cpp docker image, not sure if you've checked that out

LLukas22 commented 1 year ago

@abetlen Yup i had a look and tried their image, it runs on my Intel machine but fails with exit code 132 on both my AMD based systems. I will create a new issue in the llama.cpp repository.

gjmulder commented 1 year ago

These old build issues where llama.cpp gets confused in what features are exposed on different (virtualised?) h/w might be of help:

Can it support avx cpu's older than 10 years old better support for different x86_64 CPU instruction extensions

EDIT: Is there a chance that the Docker image is configured for one type of h/w (e.g. Intel), and then fails when moved to another (e.g. AMD)?

LLukas22 commented 1 year ago

@gjmulder Well the images are build on an github-actions-runner which probably uses a virtualized Intel CPU. I then downloaded them an tried to run them on my AMD systems, which leads to the errors. The thing that confuses me is that all systems i used support all CPU features listed in the cmake file namely avx, avx2, fma and f16c. I don't quite get why an image build on one of these systems shouldn't work when moved to another system supporting the exact same instruction sets.

gjmulder commented 1 year ago

I'm guessing here, as I haven't had time to repro, but will do so in the next few days:

Maybe do a lscpu | grep Flags > /var/tmp/lscpu_build_flags as part of github-actions-runner build and confirm that in fact they are the exactly same?

LLukas22 commented 1 year ago

@gjmulder Alright i ran it locally and on the github-actions-runner. Here are the results:

Actions Runner:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear

Local:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip rdpid

Maybe the problem is caused by the avx512 instruction set?

gjmulder commented 1 year ago

Maybe the problem is caused by the avx512 instruction set?

@LLukas22 this might be relevant: https://github.com/ggerganov/llama.cpp/pull/809

LLukas22 commented 1 year ago

@gjmulder Yes this is probably related. Do you know if it is possible to somehow change what featuresets are used in the compilation of llama.cpp via environment variables? Then i could just set them in the dockerfile and use the normal setup from this repo. If not i probably have to build it manually and then copy around the binary.

gjmulder commented 1 year ago

@LLukas22 I seem to remember that the Makefile was checking the output of lscpu, but it seems things have moved on since then.

This might help. Maybe you can override the g++ generated macros? A bit hacky, but might work :crossed_fingers:

EDIT: GPT-4 to the rescue?

CXXFLAGS = -march=native -U__SSE3__ -U__AVX__ -U__AVX2__ -D__SSE__ -D__SSE2__

This line will keep the -march=native option but will disable SSE3, AVX, and AVX2 while enabling SSE and SSE2.

Again, note that forcibly enabling SIMD instruction sets that your CPU doesn't support may cause your program to crash or produce incorrect results. Make sure you understand the implications of enabling or disabling these macros before making any changes.

LLukas22 commented 1 year ago

@gjmulder Hm this could actually work im gonna try it later and i will just try to disable avx512 but i dont know what flag to set 😓. But i guess i also have to set the CFLAGS as ggml is compiled with them.

gjmulder commented 1 year ago

Hm this could actually work im gonna try it later and i will just try to disable avx512 but i dont know what flag to set sweat. But i guess i also have to set the CFLAGS as ggml is compiled with them.

@LLukas22 You could just query gcc in your target env and use those flags for the Docker build. You are essentially cross-compiling. Fairly sure this problem has been solved more elegantly by someone somewhere.

LLukas22 commented 1 year ago

@gjmulder Alrigth i had another look, setting the CXXFLAGS or CFLAGS is not possible as they are reset in the make file. But if i use cmake i can simply pass an -D LLAMA_AVX512=OFF flag to enable/disable the avx512 instructions. I will probably use this to build llama.cpp manually in a workflow and then copy the resulting libllama.so into my docker containers. This way i can easily create a separate avx512 and avx2 container.

LLukas22 commented 1 year ago

@abetlen Alright i now have an action that builds me a libllama.so binary with and without avx512. But if i copy this binary into my docker container and set the LLAMA_CPP_LIB environment variable to point towards the binary i'm back to the 'GLIBCXX_3.4.29' not found error 🤔.

The my dockerfile now looks like this and is available in this fork:

FROM python:3.10

#Copy the compiled llama.so file to the container
COPY ./lib /lib/

#Set the environment variable for the llama.so file
ENV LLAMA_CPP_LIB /lib/llama.so

# Copy the python code to the container
RUN mkdir -p /app
COPY ./llama_cpp /app
WORKDIR /app

#Add the current directory to the PYTHONPATH
ENV PYTHONPATH "${PYTHONPATH}:/app"

# Install Requirements
RUN --mount=type=cache,target=/root/.cache/pip pip3 install -r ./server/requirements.txt

#Set default environment variables
ENV HOST 0.0.0.0
ENV PORT 8000

#Expose the port
EXPOSE ${PORT}

#Run the server
CMD ["python3","-m","server"]

This is the binary i used: llama.zip

I'm at my wits ends here, i'm probably just gonna build my containers on the machines i will run them on.

gjmulder commented 1 year ago

Here to help you debug @LLukas22 :smile:

EDIT: Full clone and build of your repo:

$ git submodule update --init --recursive
Submodule 'vendor/llama.cpp' (git@github.com:ggerganov/llama.cpp.git) registered for path 'vendor/llama.cpp'
Cloning into '/home/mulderg/Work/llama-cpp-python-Docker/vendor/llama.cpp'...
X11 forwarding request failed on channel 0
Submodule path 'vendor/llama.cpp': checked out 'e95b6554b493e71a0275764342e09bd5784a7026'
(lcp) mulderg@asushimu:~/Work/llama-cpp-python-Docker$ python3 setup.py develop
/home/mulderg/anaconda3/envs/lcp/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(

--------------------------------------------------------------------------------
-- Trying 'Ninja' generator
--------------------------------
---------------------------
----------------------
-----------------
------------
-------
--
Not searching for unused variables given on the command line.
CMake Error: CMake was unable to find a build program corresponding to "Ninja".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
-- Configuring incomplete, errors occurred!
See also "/home/mulderg/Work/llama-cpp-python-Docker/_cmake_test_compile/build/CMakeFiles/CMakeOutput.log".
--
-------
------------
-----------------
----------------------
---------------------------
--------------------------------
-- Trying 'Ninja' generator - failure
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
-- Trying 'Unix Makefiles' generator
--------------------------------
---------------------------
----------------------
-----------------
------------
-------
--
Not searching for unused variables given on the command line.
-- The C compiler identification is GNU 10.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The CXX compiler identification is GNU 10.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/mulderg/Work/llama-cpp-python-Docker/_cmake_test_compile/build
--
-------
------------
-----------------
----------------------
---------------------------
--------------------------------
-- Trying 'Unix Makefiles' generator - success
--------------------------------------------------------------------------------

Configuring Project
  Working directory:
    /home/mulderg/Work/llama-cpp-python-Docker/_skbuild/linux-x86_64-3.10/cmake-build
  Command:
    /usr/local/bin/cmake /home/mulderg/Work/llama-cpp-python-Docker -G 'Unix Makefiles' --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/home/mulderg/Work/llama-cpp-python-Docker/_skbuild/linux-x86_64-3.10/cmake-install -DPYTHON_VERSION_STRING:STRING=3.10.10 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/home/mulderg/anaconda3/envs/lcp/lib/python3.10/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/home/mulderg/anaconda3/envs/lcp/bin/python3 -DPYTHON_INCLUDE_DIR:PATH=/home/mulderg/anaconda3/envs/lcp/include/python3.10 -DPYTHON_LIBRARY:PATH=/home/mulderg/anaconda3/envs/lcp/lib/libpython3.10.so -DPython_EXECUTABLE:PATH=/home/mulderg/anaconda3/envs/lcp/bin/python3 -DPython_ROOT_DIR:PATH=/home/mulderg/anaconda3/envs/lcp -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/home/mulderg/anaconda3/envs/lcp/include/python3.10 -DPython_LIBRARY:PATH=/home/mulderg/anaconda3/envs/lcp/lib/libpython3.10.so -DPython3_EXECUTABLE:PATH=/home/mulderg/anaconda3/envs/lcp/bin/python3 -DPython3_ROOT_DIR:PATH=/home/mulderg/anaconda3/envs/lcp -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/home/mulderg/anaconda3/envs/lcp/include/python3.10 -DPython3_LIBRARY:PATH=/home/mulderg/anaconda3/envs/lcp/lib/libpython3.10.so -DCMAKE_BUILD_TYPE:STRING=Release

Not searching for unused variables given on the command line.
-- The C compiler identification is GNU 10.4.0
-- The CXX compiler identification is GNU 10.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/mulderg/Work/llama-cpp-python-Docker/_skbuild/linux-x86_64-3.10/cmake-build
[100%] Generating /home/mulderg/Work/llama-cpp-python-Docker/vendor/llama.cpp/libllama.so
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -march=native -mtune=native -DGGML_USE_OPENBLAS -I/usr/local/include/openblas
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:  -lopenblas
I CC:       cc (Ubuntu 10.4.0-4ubuntu1~22.04) 10.4.0
I CXX:      g++ (Ubuntu 10.4.0-4ubuntu1~22.04) 10.4.0

In file included from llama.cpp:6:
llama_util.h:60:2: warning: extra ‘;’ [-Wpedantic]
   60 | };
      |  ^
[100%] Built target run
Install the project...
-- Install configuration: "Release"
-- Installing: /home/mulderg/Work/llama-cpp-python-Docker/_skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/libllama.so
copying _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so

running develop
/home/mulderg/anaconda3/envs/lcp/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running egg_info
creating llama_cpp_python.egg-info
writing llama_cpp_python.egg-info/PKG-INFO
writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
writing requirements to llama_cpp_python.egg-info/requires.txt
writing top-level names to llama_cpp_python.egg-info/top_level.txt
writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
adding license file 'LICENSE.md'
writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
running build_ext
Creating /home/mulderg/anaconda3/envs/lcp/lib/python3.10/site-packages/llama-cpp-python.egg-link (link to .)
Removing llama-cpp-python 0.1.34 from easy-install.pth file
Adding llama-cpp-python 0.1.34 to easy-install.pth file

Installed /home/mulderg/Work/llama-cpp-python-Docker
Processing dependencies for llama-cpp-python==0.1.34
Searching for typing-extensions==4.5.0
Best match: typing-extensions 4.5.0
Processing typing_extensions-4.5.0-py3.10.egg
typing-extensions 4.5.0 is already the active version in easy-install.pth

Using /home/mulderg/anaconda3/envs/lcp/lib/python3.10/site-packages/typing_extensions-4.5.0-py3.10.egg
Finished processing dependencies for llama-cpp-python==0.1.34

$ docker build --force-rm -t lcp .
Sending build context to Docker daemon  6.337MB
Step 1/12 : FROM python:3.10
 ---> c339f65d6ddf
Step 2/12 : COPY ./lib /lib/
COPY failed: file not found in build context or excluded by .dockerignore: stat lib: file does not exist
LLukas22 commented 1 year ago

@gjmulder Alright thats kind of my bad, the lib folder is created in a github action and contains the llama.so binary i added above. I now also added a lib folder with an llama.so that was generated this way into the repo for testing purposes.

There is no need to build or recursively checkout the repo. To reproduce the issue i'm experiencing just clone the fork and run a docker build and run on the dockerfile. The build should work without any errors and when trying to run the container the 'GLIBCXX_3.4.29' not found error should occure.

The buildprocess of the llama.so binary is described in this workflow file.