ShelbyJenkins / llm_client

The Easiest Rust Interface for Local LLMs and an Interface for Deterministic Signals from Probabilistic LLM Vibes
MIT License
131 stars 11 forks source link
ai candle gguf llm rust

Contributors Forks Stargazers Issues MIT License

The Easiest Rust Interface for Local LLMs

# For Mac (CPU and GPU), windows (CPU and CUDA), or linux (CPU and CUDA)
llm_client="*"

This will download and build llama.cpp. See build.md for other features and backends like mistral.rs.

use Llmclient::prelude::*;
// Loads the largest quant available based on your VRAM or system memory
let llm_client = LlmClient::llama_cpp()
    .mistral7b_instruct_v0_3() // Uses a preset model
    .init() // Downloads model from hugging face and starts the inference interface
    .await?;

Several of the most common models are available as presets. Loading from local models is also fully supported. See models.md for more information.

Features

An Interface for Deterministic Signals from Probabilistic LLM Vibes

In addition to basic LLM inference, llm_client is primarily designed for controlled generation using step based cascade workflows. This prompting system runs pre-defined workflows that control and constrain both the overall structure of generation and individual tokens during inference. This allows the implementation of specialized workflows for specific tasks, shaping LLM outputs towards intended, reproducible outcomes.

let response: u32 = llm_client.reason().integer()
    .instructions()
    .set_content("Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?")
    .return_primitive().await?;

// Recieve 'primitive' outputs
assert_eq!(response, 1)

This runs the reason one round cascading prompt workflow with an integer output.

An example run of this workflow with these instructions.

This method significantly improves the reliability of LLM use cases. For example, there are test cases this repo that can be used to benchmark an LLM. There is a large increase in accuracy when comparing basic inference with a constrained outcome and a CoT style cascading prompt workflow. The decision workflow that runs N count of CoT workflows across a temperature gradient approaches 100% accuracy for the test cases.

I have a full breakdown of this in my blog post, "Step-Based Cascading Prompts: Deterministic Signals from the LLM Vibe Space."

Jump to the readme.md of the llm_client crate to find out how to use them.

Examples

Docs

Guides

Blog Posts

Roadmap

Dependencies

Contact

Shelby Jenkins - Here or Linkedin