Closed LLukas22 closed 1 year ago
Do you mind posting the Dockerfile? I haven't tried to use this inside a container but I'l investigate.
Alright i cant send you the Dockerfile, but i created a toy-example with your own server.
Dockerfile:
FROM python:3.10
#install
RUN pip3 install llama-cpp-python[server]
#Expose the ports
EXPOSE 8000
# Make a Dir for the models to be mounted into
RUN mkdir -p /var/lib/models
#Set environment
ENV MODEL "/var/lib/models/ggjt-model.bin"
#Start server
CMD ["python3","-m","llama_cpp.server"]
I then started the container via this docker-compose file:
version: "3.9"
services:
api:
build:
context: .
dockerfile: Dockerfile
image: llama-server
container_name: llama-server
ports:
- "8000:8000"
volumes:
- [MODEL_DIR]:/var/lib/models
When running this the 'GLIBCXX_3.4.29' error is gone, but the model loads for a very long time and the container gets OOM killed sometimes. I'm running on 64GB so that's very strange. The other thing i noticed is that the container fails to bind the address but that's just docker IPv4 shenanigans. I will try to use llama-cpp-python[server] as a dependency in my other project and see if it will get rid of the 'GLIBCXX_3.4.29' error.
Oh and just for completness i was using this model.
So the GLIBC error is fixed now? I would also make sure the OOM issue isn't some Docker default limit.
Thats the strange thing the dockerfile listed above works without any problems. But if im trying to run my Dockerfile im getting the "GLIBC" error.
This is my Dockerfile:
FROM python:3.10
RUN apt-get update -y
RUN apt-get install -y python3-pip graphviz-dev git gcc-4.9
RUN apt-get upgrade -y libstdc++6
#Expose the ports
EXPOSE 8001
# Copy Files
RUN mkdir -p /app
ADD ./ /app/
WORKDIR /app
# Install Requirements
RUN --mount=type=cache,target=/root/.cache/pip pip3 install -r ./api/requirements-cpu.txt
RUN --mount=type=cache,target=/root/.cache/pip pip3 install git+https://github.com/huggingface/transformers@a17841ac4945631e4e13c072fa2a329b98ebb8b6
ENV PYTHONPATH "${PYTHONPATH}:/app"
#Build cache dir for transformers
ENV HF_HOME "/huggingface/cache"
CMD ["python3", "/app/api/main.py"]
and its using the following requirements file:
protobuf==3.20.1
huggingface_hub
llama-cpp-python[server]
openai
pynvml
farm-haystack
#api dependencies
psutil
fastapi
uvicorn
dependency_injector
My guess is one of the dependencies in the requirements does some weird stuff, but i actually have no clue where to start looking.
I run within a Ubuntu container which works. https://github.com/mkellerman/gpt4all-ui/ runs a 3.11 container and it works so I would guess the issue is not with llama-cpp-python but with your concrete dockerfile/requirements.
Took the opportunity to shrink my own dockerfile:
FROM python:3.10
COPY .devops/requirements.txt requirements.txt
RUN pip install -r requirements.txt && rm -rf requirements.txt
ENTRYPOINT [ "python3", "-m", "llama_cpp.server" ]
works without a hitch with requirements.txt
llama-cpp-python
uvicorn
fastapi
sse_starlette
typing_extensions
I also played around a bit more and i couldnt get the container working, and i dont know why. All the other containers i build with llama-cpp-python work without any problems. I will probably host a seperate llama-cpp-python container and then use it via its REST api.
@abetlen Could we maybe get an official prebuild docker container for the REST server?
Yeah if someone wants to open a PR I can test / review it when I have some time.
I could try later. But i guess i noticed another problem i had, when building my containers via github-actions and then trying to run them locally the containers exit with exitcode 132 hinting at an unsupported CPU instruction-set. If build and run locally they work without any problems.
My guess the is the llama.cpp dependencies are resolved in the build step of the container with the cpu feature flags of the github-actions-runner. Can i define specific feature flags used for the llama.cpp compilation at setup time or should i look into first building llama.cpp manually and then setting it via the environment?
@LLukas22 that's definitely it, should check what llama.cpp does for this with their docker containers.
@abetlen Yeah its pretty weird, i played around a bit but can't get it to work even if i use QEMU to force a linux/amd64 platform while building the image on an github-actions-runner. And i'm actually really confused what instruction set is missing as i'm using relatively new processors (AMD Zen 2 & 3) so avx and avx-2 are definitely there.
But i'm also no docker pro, so maybe i'm just missing something.
@LLukas22 is QEMU sufficient to emulate processor specific instruction sets? I'd imagine it's because of a mismatch between processor architectures in the github runner and your local machine.
@abetlen Theoretically QEMU should be able to emulate any cpu features, but i dont know how it is implemented in docker build.
They probably create a vm with the cpu flags host
or max
which will use all features of the host system. Then the problem with the architecture mismatch happens when i try to run it locally. But i'm still a bit confused thought nearly all x64 cpus released in the last <5 years implement the same instructionsets 🤔
@LLukas22 https://github.com/ggerganov/llama.cpp/blob/master/.github/workflows/docker.yml describes how they build the llama.cpp docker image, not sure if you've checked that out
@abetlen Yup i had a look and tried their image, it runs on my Intel machine but fails with exit code 132 on both my AMD based systems. I will create a new issue in the llama.cpp repository.
These old build issues where llama.cpp
gets confused in what features are exposed on different (virtualised?) h/w might be of help:
Can it support avx cpu's older than 10 years old better support for different x86_64 CPU instruction extensions
EDIT: Is there a chance that the Docker image is configured for one type of h/w (e.g. Intel), and then fails when moved to another (e.g. AMD)?
@gjmulder Well the images are build on an github-actions-runner which probably uses a virtualized Intel CPU. I then downloaded them an tried to run them on my AMD systems, which leads to the errors. The thing that confuses me is that all systems i used support all CPU features listed in the cmake file namely avx, avx2, fma and f16c. I don't quite get why an image build on one of these systems shouldn't work when moved to another system supporting the exact same instruction sets.
I'm guessing here, as I haven't had time to repro, but will do so in the next few days:
Maybe do a lscpu | grep Flags > /var/tmp/lscpu_build_flags
as part of github-actions-runner build and confirm that in fact they are the exactly same?
@gjmulder Alright i ran it locally and on the github-actions-runner. Here are the results:
Actions Runner:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
Local:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip rdpid
Maybe the problem is caused by the avx512
instruction set?
Maybe the problem is caused by the
avx512
instruction set?
@LLukas22 this might be relevant: https://github.com/ggerganov/llama.cpp/pull/809
@gjmulder Yes this is probably related. Do you know if it is possible to somehow change what featuresets are used in the compilation of llama.cpp via environment variables? Then i could just set them in the dockerfile and use the normal setup from this repo. If not i probably have to build it manually and then copy around the binary.
@LLukas22 I seem to remember that the Makefile
was checking the output of lscpu
, but it seems things have moved on since then.
This might help. Maybe you can override the g++
generated macros? A bit hacky, but might work :crossed_fingers:
EDIT: GPT-4 to the rescue?
CXXFLAGS = -march=native -U__SSE3__ -U__AVX__ -U__AVX2__ -D__SSE__ -D__SSE2__
This line will keep the -march=native option but will disable SSE3, AVX, and AVX2 while enabling SSE and SSE2.
Again, note that forcibly enabling SIMD instruction sets that your CPU doesn't support may cause your program to crash or produce incorrect results. Make sure you understand the implications of enabling or disabling these macros before making any changes.
@gjmulder Hm this could actually work im gonna try it later and i will just try to disable avx512
but i dont know what flag to set 😓. But i guess i also have to set the CFLAGS
as ggml is compiled with them.
Hm this could actually work im gonna try it later and i will just try to disable
avx512
but i dont know what flag to set sweat. But i guess i also have to set theCFLAGS
as ggml is compiled with them.
@LLukas22 You could just query gcc
in your target env and use those flags for the Docker build. You are essentially cross-compiling. Fairly sure this problem has been solved more elegantly by someone somewhere.
@gjmulder Alrigth i had another look, setting the CXXFLAGS
or CFLAGS
is not possible as they are reset in the make file. But if i use cmake i can simply pass an -D LLAMA_AVX512=OFF
flag to enable/disable the avx512
instructions. I will probably use this to build llama.cpp manually in a workflow and then copy the resulting libllama.so
into my docker containers. This way i can easily create a separate avx512
and avx2
container.
@abetlen Alright i now have an action that builds me a libllama.so
binary with and without avx512
. But if i copy this binary into my docker container and set the LLAMA_CPP_LIB
environment variable to point towards the binary i'm back to the 'GLIBCXX_3.4.29' not found
error 🤔.
The my dockerfile now looks like this and is available in this fork:
FROM python:3.10
#Copy the compiled llama.so file to the container
COPY ./lib /lib/
#Set the environment variable for the llama.so file
ENV LLAMA_CPP_LIB /lib/llama.so
# Copy the python code to the container
RUN mkdir -p /app
COPY ./llama_cpp /app
WORKDIR /app
#Add the current directory to the PYTHONPATH
ENV PYTHONPATH "${PYTHONPATH}:/app"
# Install Requirements
RUN --mount=type=cache,target=/root/.cache/pip pip3 install -r ./server/requirements.txt
#Set default environment variables
ENV HOST 0.0.0.0
ENV PORT 8000
#Expose the port
EXPOSE ${PORT}
#Run the server
CMD ["python3","-m","server"]
This is the binary i used: llama.zip
I'm at my wits ends here, i'm probably just gonna build my containers on the machines i will run them on.
Here to help you debug @LLukas22 :smile:
EDIT: Full clone and build of your repo:
$ git submodule update --init --recursive
Submodule 'vendor/llama.cpp' (git@github.com:ggerganov/llama.cpp.git) registered for path 'vendor/llama.cpp'
Cloning into '/home/mulderg/Work/llama-cpp-python-Docker/vendor/llama.cpp'...
X11 forwarding request failed on channel 0
Submodule path 'vendor/llama.cpp': checked out 'e95b6554b493e71a0275764342e09bd5784a7026'
(lcp) mulderg@asushimu:~/Work/llama-cpp-python-Docker$ python3 setup.py develop
/home/mulderg/anaconda3/envs/lcp/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
--------------------------------------------------------------------------------
-- Trying 'Ninja' generator
--------------------------------
---------------------------
----------------------
-----------------
------------
-------
--
Not searching for unused variables given on the command line.
CMake Error: CMake was unable to find a build program corresponding to "Ninja". CMAKE_MAKE_PROGRAM is not set. You probably need to select a different build tool.
-- Configuring incomplete, errors occurred!
See also "/home/mulderg/Work/llama-cpp-python-Docker/_cmake_test_compile/build/CMakeFiles/CMakeOutput.log".
--
-------
------------
-----------------
----------------------
---------------------------
--------------------------------
-- Trying 'Ninja' generator - failure
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
-- Trying 'Unix Makefiles' generator
--------------------------------
---------------------------
----------------------
-----------------
------------
-------
--
Not searching for unused variables given on the command line.
-- The C compiler identification is GNU 10.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The CXX compiler identification is GNU 10.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/mulderg/Work/llama-cpp-python-Docker/_cmake_test_compile/build
--
-------
------------
-----------------
----------------------
---------------------------
--------------------------------
-- Trying 'Unix Makefiles' generator - success
--------------------------------------------------------------------------------
Configuring Project
Working directory:
/home/mulderg/Work/llama-cpp-python-Docker/_skbuild/linux-x86_64-3.10/cmake-build
Command:
/usr/local/bin/cmake /home/mulderg/Work/llama-cpp-python-Docker -G 'Unix Makefiles' --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/home/mulderg/Work/llama-cpp-python-Docker/_skbuild/linux-x86_64-3.10/cmake-install -DPYTHON_VERSION_STRING:STRING=3.10.10 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/home/mulderg/anaconda3/envs/lcp/lib/python3.10/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/home/mulderg/anaconda3/envs/lcp/bin/python3 -DPYTHON_INCLUDE_DIR:PATH=/home/mulderg/anaconda3/envs/lcp/include/python3.10 -DPYTHON_LIBRARY:PATH=/home/mulderg/anaconda3/envs/lcp/lib/libpython3.10.so -DPython_EXECUTABLE:PATH=/home/mulderg/anaconda3/envs/lcp/bin/python3 -DPython_ROOT_DIR:PATH=/home/mulderg/anaconda3/envs/lcp -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/home/mulderg/anaconda3/envs/lcp/include/python3.10 -DPython_LIBRARY:PATH=/home/mulderg/anaconda3/envs/lcp/lib/libpython3.10.so -DPython3_EXECUTABLE:PATH=/home/mulderg/anaconda3/envs/lcp/bin/python3 -DPython3_ROOT_DIR:PATH=/home/mulderg/anaconda3/envs/lcp -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/home/mulderg/anaconda3/envs/lcp/include/python3.10 -DPython3_LIBRARY:PATH=/home/mulderg/anaconda3/envs/lcp/lib/libpython3.10.so -DCMAKE_BUILD_TYPE:STRING=Release
Not searching for unused variables given on the command line.
-- The C compiler identification is GNU 10.4.0
-- The CXX compiler identification is GNU 10.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/mulderg/Work/llama-cpp-python-Docker/_skbuild/linux-x86_64-3.10/cmake-build
[100%] Generating /home/mulderg/Work/llama-cpp-python-Docker/vendor/llama.cpp/libllama.so
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -march=native -mtune=native -DGGML_USE_OPENBLAS -I/usr/local/include/openblas
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS: -lopenblas
I CC: cc (Ubuntu 10.4.0-4ubuntu1~22.04) 10.4.0
I CXX: g++ (Ubuntu 10.4.0-4ubuntu1~22.04) 10.4.0
In file included from llama.cpp:6:
llama_util.h:60:2: warning: extra ‘;’ [-Wpedantic]
60 | };
| ^
[100%] Built target run
Install the project...
-- Install configuration: "Release"
-- Installing: /home/mulderg/Work/llama-cpp-python-Docker/_skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/libllama.so
copying _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
running develop
/home/mulderg/anaconda3/envs/lcp/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running egg_info
creating llama_cpp_python.egg-info
writing llama_cpp_python.egg-info/PKG-INFO
writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
writing requirements to llama_cpp_python.egg-info/requires.txt
writing top-level names to llama_cpp_python.egg-info/top_level.txt
writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
adding license file 'LICENSE.md'
writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
running build_ext
Creating /home/mulderg/anaconda3/envs/lcp/lib/python3.10/site-packages/llama-cpp-python.egg-link (link to .)
Removing llama-cpp-python 0.1.34 from easy-install.pth file
Adding llama-cpp-python 0.1.34 to easy-install.pth file
Installed /home/mulderg/Work/llama-cpp-python-Docker
Processing dependencies for llama-cpp-python==0.1.34
Searching for typing-extensions==4.5.0
Best match: typing-extensions 4.5.0
Processing typing_extensions-4.5.0-py3.10.egg
typing-extensions 4.5.0 is already the active version in easy-install.pth
Using /home/mulderg/anaconda3/envs/lcp/lib/python3.10/site-packages/typing_extensions-4.5.0-py3.10.egg
Finished processing dependencies for llama-cpp-python==0.1.34
$ docker build --force-rm -t lcp .
Sending build context to Docker daemon 6.337MB
Step 1/12 : FROM python:3.10
---> c339f65d6ddf
Step 2/12 : COPY ./lib /lib/
COPY failed: file not found in build context or excluded by .dockerignore: stat lib: file does not exist
@gjmulder Alright thats kind of my bad, the lib
folder is created in a github action and contains the llama.so
binary i added above. I now also added a lib
folder with an llama.so
that was generated this way into the repo for testing purposes.
There is no need to build or recursively checkout the repo. To reproduce the issue i'm experiencing just clone the fork and run a docker build and run on the dockerfile. The build should work without any errors and when trying to run the container the 'GLIBCXX_3.4.29' not found
error should occure.
The buildprocess of the llama.so
binary is described in this workflow file.
When i try to install and use this package via a requirements file in the default 3.10 python container i get the following error when i try to import the module:
Failed to load shared library '/usr/local/lib/python3.10/site-packages/llama_cpp/libllama.so': /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version 'GLIBCXX_3.4.29' not found (required by /usr/local/lib/python3.10/site-packages/llama_cpp/libllama.so)
Am i doing something wrong? Or am i just missing some dependencies?