defenseunicorns / leapfrogai-backend-llama-cpp-python

LeapfrogAI backend using llama-cpp-python
Apache License 2.0
3 stars 0 forks source link

Configurable model download in automated Zarf package #12

Closed justinthelaw closed 9 months ago

justinthelaw commented 9 months ago

Model downloads and zarf package create should be more automated. The following script is a starting point for thinking through this:

#!/bin/bash

# Prompt for the Hugging Face filename and repo, see Hugging Face documentation for more details
# e.g., TheBloke/SynthIA-7B-v2.0-16k-GGUF
read -p "Enter the Hugging Face repository: " repository
# e.g., synthia-7b-v2.0-16k.Q8_0.gguf
read -p "Enter the Hugging Face GGUF file's name: " filename
# e.g., synthia-7b-v2-0
read -p "Enter the model Display Name: " model_name

# Create temp directory
if [ ! -d $model_name ]; then
  mkdir tmp/$model_name
fi

# Copy over the config.yaml
# See LeapfrogAI SDK documentation for more details
cp -f tools/config.yaml ../leapfrogai-infrastructure/leapfrogai-backend-llama-cpp-python/config.yaml

# Go to the llama-cpp-python backend
cd ../leapfrogai-infrastructure/leapfrogai-backend-llama-cpp-python

# Create the Docker image
# Expects the registry:2 is still up and running, 
#   see Docker documentation for more details
version="1.0.0"
image_tag="defenseunicorns/leapfrogai/$model_name:$version"

docker build --build-arg FILE=$filename --build-arg REPO=$repository -f Dockerfile.gpu -t ghcr.io/$image_tag . && \
docker tag ghcr.io/$image_tag localhost:5000/$image_tag && \
docker push localhost:5000/$image_tag

# Create Zarf package
# See Zarf documentation for more details
zarf package create \
    --registry-override ghcr.io=localhost:5000 \
    --set image_version=$version \
    --set name=$model_name \
    --set IMAGE_REPOSITORY=ghcr.io/defenseunicorns/leapfrogai/$model_name \
    --confirm

# Move completed zarf package back to model zoo
mv zarf-*.tar.zst ../../model-zoo/tmp/$model_name/

# Execute Zarf deployment and clean-up
# See Kubernetes and Zarf documentation for more details, 
#   also see the zarf-config.yaml in "../leapfrogai-infrastructure/leapfrogai-backend-llama-cpp-python",
#   also note that the memory limits for RAM are similar to vRAM limit of 1x NVIDIA H100
cd ../../model-zoo/tmp/$model_name && \
zarf package deploy \
    --set name=$model_name \
    --set image_repository=ghcr.io/$image_tag \
    --set GPU_ENABLED=true \
    --set LIMITS_GPU=1 \
    --set REQUESTS_GPU=1 \
    --set LIMITS_CPU=1 \
    --set REQUESTS_CPU=1 \
    --set LIMITS_MEMORY="100Gi" \
    --set REQUESTS_MEMORY="50Gi" \
    zarf-*.tar.zst && \
cd ../ && rm -rf tmp/$model_name