Floneum makes it easy to develop applications that use local pre-trained AI models. There are two main projects in this monorepo:
Kalosm is a simple interface for pre-trained models in Rust that backs Floneum. It makes it easy to interact with pre-trained, language, audio, and image models.
Kalosm supports a variety of models. Here is a list of the models that are currently supported:
Model | Modality | Size | Description | Quantized | CUDA + Metal Accelerated | Example |
---|---|---|---|---|---|---|
Llama | Text | 1b-70b | General purpose language model | ✅ | ✅ | llama 3 chat |
Mistral | Text | 7-13b | General purpose language model | ✅ | ✅ | mistral chat |
Phi | Text | 2b-4b | Small reasoning focused language model | ✅ | ✅ | phi 3 chat |
Whisper | Audio | 20MB-1GB | Audio transcription model | ✅ | ✅ | live whisper transcription |
RWuerstchen | Image | 5gb | Image generation model | ❌ | ✅ | rwuerstchen image generation |
TrOcr | Image | 3gb | Optical character recognition model | ❌ | ✅ | Text Recognition |
Segment Anything | Image | 50MB-400MB | Image segmentation model | ❌ | ❌ | Image Segmentation |
Bert | Text | 100MB-1GB | Text embedding model | ❌ | ✅ | Semantic Search |
Kalosm also supports a variety of utilities around pre-trained models. These include:
Kalosm uses the candle machine learning library to run models in pure rust. It supports quantized and accelerated models with performance on par with llama.cpp
:
Mistral 7b | Accelerator | Kalosm | llama.cpp |
---|---|---|---|
Metal (M2) | 39 t/s | 27 t/s |
Kalosm supports structured generation with arbitrary parsers. It uses a custom parser engine and sampler and structure-aware acceleration to make structure generation even faster than uncontrolled text generation. You can take any rust type and add #[derive(Parse, Schema)]
to make it usable with structured generation:
use kalosm::language::*;
/// A fictional character
#[derive(Parse, Schema, Clone, Debug)]
struct Character {
/// The name of the character
#[parse(pattern = "[A-Z][a-z]{2,10} [A-Z][a-z]{2,10}")]
name: String,
/// The age of the character
#[parse(range = 1..=100)]
age: u8,
/// A description of the character
#[parse(pattern = "[A-Za-z ]{40,200}")]
description: String,
}
#[tokio::main]
async fn main() {
// First create a model. Chat models tend to work best with structured generation
let model = Llama::phi_3().await.unwrap();
// Then create a task with the parser as constraints
let task = Task::builder_for::<[Character; 10]>("You generate realistic JSON placeholders for characters")
.build();
// Finally, run the task
let mut stream = task.run("Create a list of random characters", &model);
stream.to_std_out().await.unwrap();
let character = stream.await.unwrap();
println!("{character:?}");
}
https://github.com/user-attachments/assets/8900f57d-55c8-4d4a-a67b-73beab1e5155
In addition to regex, you can provide your own grammar to generate structured data. This lets you constrain the response to any structure you want including complex data structures like JSON, HTML, and XML.
This quickstart will get you up and running with a simple chatbot. Let's get started!
A more complete guide for Kalosm is available on the Kalosm website, and examples are available in the examples folder.
1) Install rust 2) Create a new project:
cargo new kalosm-hello-world
cd ./kalosm-hello-world
3) Add Kalosm as a dependency
# You can use `--features language,metal`, `--features language,cuda`, or `--features language,mkl` if your machine supports an accelerator
cargo add kalosm --features language
cargo add tokio --features full
4) Add this code to your main.rs
file
use kalosm::language::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let model = Llama::phi_3().await?;
let mut chat = Chat::builder(model)
.with_system_prompt("You are a pirate called Blackbeard")
.build();
loop {
chat.add_message(prompt_input("\n> ")?)
.to_std_out()
.await?;
}
}
5) Run your application with:
cargo run --release
If you are interested in either project, you can join the discord to discuss the project and get help.