Open mmirman opened 1 year ago
Note: Some of these have existing issues attached to them! This needs cleanup and organization (this is high-priority).
I think that some features build on others. For example, Output Templating can depend on Persistent Stateful Memory by remembering the template. I have used LangChain and the way they do their templating is by injecting it into the users input as a header for every call. Having Memory can eliminate this. Additionally, the backtracking can also be extended to the Persistent State not just outputs. In conclusion, I believe Persistent Stateful Memory should go before Output Templating and perhaps at the same time as backtracking.
the way they do their templating is by injecting it into the users input as a header for every call
We want strong guarantees with our templating. The ability to sampling constrained by regex for example
High-Level Architecture Overview
[x] Implicit Agents ๐ง๐ต๏ธ: The Anarchy LLM-VM can be set up to use external tools through our agents such as REBEL just by supplying tool descriptions!
[ ] Inference Optimization ๐: The Anarchy LLM-VM is optimized from the agent level all the way to assembly on known LLM architectures to get the most bang for your buck. With state-of-the-art batching, sparse inference and quantization, distillation, and multi-level colocation, we aim to provide the fastest framework available.
[x] Task Auto-Optimization ๐ : The Anarchy LLM-VM will analyze your use cases for repetitive tasks where it can activate student-teacher distillation to train a super-efficient small model from a larger more general model without losing accuracy. It can furthermore take advantage of data-synthesis techniques to improve results.
[x] Library Callable ๐: We provide a library that can be used from any Python codebase directly.
[ ] HTTP Endpoints ๐ธ๏ธ: We provide an HTTP standalone server to handle completion requests.
[ ] Live Data Augmentation ๐: You will be able to provide a live updating data set and the Anarchy LLM-VM will fine-tune your models or work with a vector DB to provide up-to-date information with citations
[ ] Web Playground ๐: You will be able to run the Anarchy LLM-VM and test its outputs from the browser.
[ ] Load-Balancing and Orchestration โ๏ธ: If you have multiple LLMs or providers you'd like to utilize, you will be able to hand them to the Anarchy LLM-VM to automatically figure out which to work with and when to optimize your uptime or your costs
[x] Output Templating ๐คต: You can ensure that the LLM only outputs data in specific formats and fills in variables from a template with either regular expressions, LMQL, or OpenAI's template language
[ ] Persistent Stateful Memory ๐: The Anarchy LLM-VM can remember a user's conversation history and react accordingly
[ ] Smart batching ๐๏ธ: Handle multiple calls at the same time from different levels of the llm-vm
[ ] Speculative Preemptive Sampling ๐ฎ: Use a small LLM to predict outputs of a larger LLM and don't fall back to the large one unless sampling is getting bad.
[ ] Token Streaming ๐ฐ: Get a hook for a constantly updating supply of tokens!
[ ] Streamed Backtracking ๐: Didn't like one output? Look at others! Efficiently.