Open rsoika opened 8 months ago
Hi @rsoika, yes as you have found all the container images from this repo are built for Jetson (ARM64+CUDA), however if you check my llama_cpp dockerfile you can see how I build it (you would just use NGC cuda base image for x86 instead)
https://github.com/dusty-nv/jetson-containers/blob/master/packages/llm/llama_cpp/Dockerfile
Note how I compile llama_cpp_python with -DLLAMA_CUBLAS=on -DLLAMA_CUDA_F16=1
flags in there
I am now using the nvida/cuda Docker image as the base image and install the llama-cpp part. This works good:
This is how my Dockerfile looks like:
# See: https://github.com/abetlen/llama-cpp-python/blob/main/docker/cuda_simple/Dockerfile
ARG CUDA_IMAGE="12.1.1-devel-ubuntu22.04"
FROM nvidia/cuda:${CUDA_IMAGE}
# Install Python3
RUN apt-get update && apt-get upgrade -y \
&& apt-get install -y build-essential python3 python3-pip gcc
# setting build related env vars
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1
# Install llama-cpp-python (build with cuda)
RUN python3 -m pip install --upgrade pip pytest cmake fastapi uvicorn
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install --upgrade llama-cpp-python
# Install fastAPI and copy app
RUN pip install fastapi-xml
COPY ./app /app
WORKDIR /app
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Hi, I have just a question and hope that someone of you can help me out as I am now on a 3-day-installation-odyssey.
I have written a small python based Rest API to run the Mistra-7B model with llama-cpp-pyhton in a Docker Container. Everything works fine on my Linux Notebook without a GPU.
Now I ordered a Server (Intel Core i7-7700 + GeForce GTX 1080). The goal is of course to use the GPU. And so I installed the Nvidia Drivers on the host and tested with nvidia-sim that all is working.
The big question I haven't been able to find any answer for days is: how can I build a Docker image with llama-cpp-python that uses my host's GPU? The whole thing seems like a special rocket science and I'm deeply frustrated.
Unfortunately, also these
dustynv/cuda-python
images don't work either for me. The error message is:Does anyone know of an easy-to-understand guide on how to do something like this? As I said, the host already has the Nvida drivers. I didn't expect it to be so complicated to teach my container to use the GPU.
Thanks for any kind of help.