defenseunicorns / leapfrogai

Production-ready Generative AI for local, cloud native, airgap, and edge deployments.
https://leapfrog.ai
Apache License 2.0
244 stars 25 forks source link

EPIC: Implement Model Directory #623

Open barronstone opened 2 weeks ago

barronstone commented 2 weeks ago

Overview

LeapfrogAI should implement a model directory to store and serve models. This would significantly reduce the size of LeapfrogAI Zarf packages, provide users with more model options, and enable the backends to dynamically swap out models for different tasks.

Background

Currently, model parameters are directly “baked in” to backend Zarf packages. This makes LeapfrogAI packages extra large, often exceeding 8GB, which causes automated deployment issues that require manual intervention to work around. Additionally, each backend, such as vllm, is only packaged with a single model. If a user wants access to multiple LLMs for different tasks (e.g., one model for chat and another for coding assistance) then they would need two vllm packages, one for each model. That current approach makes it cumbersome and impractical for users to switch between models.

Externally, we are hearing demand for users to be able to select from a collection of models to use from organizations like Platform One. This is a common and expected feature that is available in most modern AI chat interfaces. Internally, for the LeapfrogAI team to efficiently perform evaluations on a variety of models, we need a way to quickly and efficiently swap out models. Incorporating a model directory into LeapfrogAI could solve these challenges - decoupling model parameters from Zarf packages and enabling users to dynamically select which models to use from those available in the model directory.

Goals

User Stories

As a delivery engineer deploying LeapfrogAI I want the models to be separate from the backend Zarf packages So that I can easily choose which models to deploy without having to rebuild packages, and So that the packages are smaller and therefore do not require manual steps to push them

As a LeapfrogAI end user I want to be able to select from multiple LLMs So that I can choose the best model to use for a specific task

Acceptance Criteria - TODO

Given [a state]
When [an action is taken]
Then [something happens]

Additional context

In-work technical design doc in LeapfrogAI Coda: https://coda.io/d/_dGmk3eNjmm8/Model-Directory_suuoJYJF