Docker containers for llama-cpp-python which is an OpenAI compatible wrapper around llama2. The motivation is to have prebuilt containers for use in kubernetes.
Ideally we should just update llama-cpp-python to automate publishing containers and support automated model fetching from urls.
Image | Description |
---|---|
ghcr.io/allenporter/llama-cpp-server-simple |
No special hardware acceleration |
ghcr.io/allenporter/llama-cpp-server-openblas |
OpenBLAS CPU-only based computations |
ghcr.io/allenporter/llama-cpp-server-clblast |
Uses OpenCL for hardware accelleration (can use Intel or other GPUs) |
ghcr.io/allenporter/llama-cpp-server-cuda |
Preferred on NVidia GPUs |
ghcr.io/allenporter/llama-cpp-server-model-fetch |
Helper container for downloading models from URLs |
Below are examples using the container images.
Fetch a model:
$ docker run -it \
-v "./models/:/data/models:rw" \
-e "MODEL_URLS=https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf" \
ghcr.io/allenporter/llama-cpp-server-model-fetch:main
See Configuration for more details on the config file format.
$ docker run -it \
-v "./models/:/data/models" \
-v "./config/:/data" \
-e "CONFIG_FILE=/data/config.json" \
-p "8000:8000" \
ghcr.io/allenporter/llama-cpp-server-clblast:main
Build the model fetcher:
$ docker build -t model-fetch:dev model-fetch/
Fetch a model:
$ docker run -it \
-v "./models/:/data/models:rw" \
-e "MODEL_URLS=https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf" \
model-fetch:dev