Model downloads and zarf package create should be more automated. The following script is a starting point for thinking through this:
#!/bin/bash
# Prompt for the Hugging Face filename and repo, see Hugging Face documentation for more details
# e.g., TheBloke/SynthIA-7B-v2.0-16k-GGUF
read -p "Enter the Hugging Face repository: " repository
# e.g., synthia-7b-v2.0-16k.Q8_0.gguf
read -p "Enter the Hugging Face GGUF file's name: " filename
# e.g., synthia-7b-v2-0
read -p "Enter the model Display Name: " model_name
# Create temp directory
if [ ! -d $model_name ]; then
mkdir tmp/$model_name
fi
# Copy over the config.yaml
# See LeapfrogAI SDK documentation for more details
cp -f tools/config.yaml ../leapfrogai-infrastructure/leapfrogai-backend-llama-cpp-python/config.yaml
# Go to the llama-cpp-python backend
cd ../leapfrogai-infrastructure/leapfrogai-backend-llama-cpp-python
# Create the Docker image
# Expects the registry:2 is still up and running,
# see Docker documentation for more details
version="1.0.0"
image_tag="defenseunicorns/leapfrogai/$model_name:$version"
docker build --build-arg FILE=$filename --build-arg REPO=$repository -f Dockerfile.gpu -t ghcr.io/$image_tag . && \
docker tag ghcr.io/$image_tag localhost:5000/$image_tag && \
docker push localhost:5000/$image_tag
# Create Zarf package
# See Zarf documentation for more details
zarf package create \
--registry-override ghcr.io=localhost:5000 \
--set image_version=$version \
--set name=$model_name \
--set IMAGE_REPOSITORY=ghcr.io/defenseunicorns/leapfrogai/$model_name \
--confirm
# Move completed zarf package back to model zoo
mv zarf-*.tar.zst ../../model-zoo/tmp/$model_name/
# Execute Zarf deployment and clean-up
# See Kubernetes and Zarf documentation for more details,
# also see the zarf-config.yaml in "../leapfrogai-infrastructure/leapfrogai-backend-llama-cpp-python",
# also note that the memory limits for RAM are similar to vRAM limit of 1x NVIDIA H100
cd ../../model-zoo/tmp/$model_name && \
zarf package deploy \
--set name=$model_name \
--set image_repository=ghcr.io/$image_tag \
--set GPU_ENABLED=true \
--set LIMITS_GPU=1 \
--set REQUESTS_GPU=1 \
--set LIMITS_CPU=1 \
--set REQUESTS_CPU=1 \
--set LIMITS_MEMORY="100Gi" \
--set REQUESTS_MEMORY="50Gi" \
zarf-*.tar.zst && \
cd ../ && rm -rf tmp/$model_name
Model downloads and zarf package create should be more automated. The following script is a starting point for thinking through this: