dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.9k stars 416 forks source link

How to build a llama-cpp-python Container with CUDA ? #459

Open rsoika opened 3 months ago

rsoika commented 3 months ago

Hi, I have just a question and hope that someone of you can help me out as I am now on a 3-day-installation-odyssey.

I have written a small python based Rest API to run the Mistra-7B model with llama-cpp-pyhton in a Docker Container. Everything works fine on my Linux Notebook without a GPU.

Now I ordered a Server (Intel Core i7-7700 + GeForce GTX 1080). The goal is of course to use the GPU. And so I installed the Nvidia Drivers on the host and tested with nvidia-sim that all is working.

The big question I haven't been able to find any answer for days is: how can I build a Docker image with llama-cpp-python that uses my host's GPU? The whole thing seems like a special rocket science and I'm deeply frustrated.

Unfortunately, also these dustynv/cuda-python images don't work either for me. The error message is:

The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested
exec /bin/bash: exec format error

Does anyone know of an easy-to-understand guide on how to do something like this? As I said, the host already has the Nvida drivers. I didn't expect it to be so complicated to teach my container to use the GPU.

Thanks for any kind of help.

dusty-nv commented 3 months ago

Hi @rsoika, yes as you have found all the container images from this repo are built for Jetson (ARM64+CUDA), however if you check my llama_cpp dockerfile you can see how I build it (you would just use NGC cuda base image for x86 instead)

https://github.com/dusty-nv/jetson-containers/blob/master/packages/llm/llama_cpp/Dockerfile

Note how I compile llama_cpp_python with -DLLAMA_CUBLAS=on -DLLAMA_CUDA_F16=1 flags in there

rsoika commented 2 months ago

I am now using the nvida/cuda Docker image as the base image and install the llama-cpp part. This works good:

This is how my Dockerfile looks like:

# See: https://github.com/abetlen/llama-cpp-python/blob/main/docker/cuda_simple/Dockerfile
ARG CUDA_IMAGE="12.1.1-devel-ubuntu22.04"
FROM nvidia/cuda:${CUDA_IMAGE}

# Install Python3
RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y build-essential python3 python3-pip gcc 

# setting build related env vars
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1

# Install llama-cpp-python (build with cuda)
RUN python3 -m pip install --upgrade pip pytest cmake fastapi uvicorn
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install --upgrade llama-cpp-python

# Install fastAPI and copy app
RUN pip install fastapi-xml
COPY ./app /app
WORKDIR /app

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]