EPIC: Implement Model Directory

Overview

LeapfrogAI should implement a model directory to store and serve models. This would significantly reduce the size of LeapfrogAI Zarf packages, provide users with more model options, and enable the backends to dynamically swap out models for different tasks.

Background

Currently, model parameters are directly “baked in” to backend Zarf packages. This makes LeapfrogAI packages extra large, often exceeding 8GB, which causes automated deployment issues that require manual intervention to work around. Additionally, each backend, such as vllm, is only packaged with a single model. If a user wants access to multiple LLMs for different tasks (e.g., one model for chat and another for coding assistance) then they would need two vllm packages, one for each model. That current approach makes it cumbersome and impractical for users to switch between models.

Externally, we are hearing demand for users to be able to select from a collection of models to use from organizations like Platform One. This is a common and expected feature that is available in most modern AI chat interfaces. Internally, for the LeapfrogAI team to efficiently perform evaluations on a variety of models, we need a way to quickly and efficiently swap out models. Incorporating a model directory into LeapfrogAI could solve these challenges - decoupling model parameters from Zarf packages and enabling users to dynamically select which models to use from those available in the model directory.

Goals

Models are no longer directly “baked in” to the backend Zarf packages
Multiple models can be included in the UDS bundle for initial air-gapped deployment
The model directory can store and serve one or more models for each of the following types:
- Text-to-text (LLM)
- Text-to-vector (Embeddings)
- Speech-to-text (Whisper)
System administrators can add new models to the model directory of an existing LeapfrogAI deployment in an air-gapped manner (No Internet connection required)
Users can select from available LLM models for an Assistant to use via the GUI and the API

User Stories

As a delivery engineer deploying LeapfrogAI I want the models to be separate from the backend Zarf packages So that I can easily choose which models to deploy without having to rebuild packages, and So that the packages are smaller and therefore do not require manual steps to push them

As a LeapfrogAI end user I want to be able to select from multiple LLMs So that I can choose the best model to use for a specific task

Acceptance Criteria - TODO

Given [a state]
When [an action is taken]
Then [something happens]

Additional context

In-work technical design doc in LeapfrogAI Coda: https://coda.io/d/_dGmk3eNjmm8/Model-Directory_suuoJYJF

defenseunicorns / leapfrogai