ahyatt / llm

A package abstracting llm capabilities for emacs.
GNU General Public License v3.0
142 stars 19 forks source link

+TITLE: llm package for emacs

LLMs exhibit varying functionalities and APIs. This library aims to abstract functionality to a higher level, as some high-level concepts might be supported by an API while others require more low-level implementations. An example of such a concept is "examples," where the client offers example interactions to demonstrate a pattern for the LLM. While the GCloud Vertex API has an explicit API for examples, OpenAI's API requires specifying examples by modifying the system prompt. OpenAI also introduces the concept of a system prompt, which does not exist in the Vertex API. Our library aims to conceal these API variations by providing higher-level concepts in our API.

Certain functionalities might not be available in some LLMs. Any such unsupported functionality will raise a ~'not-implemented~ signal.

+begin_src emacs-lisp

(use-package llm-refactoring :init (require 'llm-openai) (setq llm-refactoring-provider (make-llm-openai :key my-openai-key))

+end_src

Here ~my-openai-key~ would be a variable you set up before with your OpenAI key. Or, just substitute the key itself as a string. It's important to remember never to check your key into a public repository such as GitHub, because your key must be kept private. Anyone with your key can use the API, and you will be charged.

All of the providers (except for =llm-fake=), can also take default parameters that will be used if they are not specified in the prompt. These are the same parameters as appear in the prompt, but prefixed with =default-chat-=. So, for example, if you find that you like Ollama to be less creative than the default, you can create your provider like:

+begin_src emacs-lisp

(make-llm-ollama :embedding-model "mistral:latest" :chat-model "mistral:latest" :default-chat-temperature 0.1)

+end_src

For embedding users. if you store the embeddings, you must set the embedding model. Even though there's no way for the llm package to tell whether you are storing it, if the default model changes, you may find yourself storing incompatible embeddings. ** Open AI You can set up with ~make-llm-openai~, with the following parameters:

You can set up with ~make-llm-vertex~, with the following parameters:

In addition to the provider, which you may want multiple of (for example, to charge against different projects), there are customizable variables:

=:key=: The API key you get from [[https://console.anthropic.com/settings/keys][Claude's settings page]]. This is required. =:chat-model=: One of the [[https://docs.anthropic.com/claude/docs/models-overview][Claude models]]. Defaults to "claude-3-opus-20240229", the most powerful model. ** Ollama [[https://ollama.ai/][Ollama]] is a way to run large language models locally. There are [[https://ollama.ai/library][many different models]] you can use with it. You set it up with the following parameters:

There is a deprecated provider, however it is no longer needed. Instead, llama cpp is Open AI compatible, so the Open AI Compatible provider should work. ** Fake This is a client that makes no call, but it just there for testing and debugging. Mostly this is of use to programmatic clients of the llm package, but end users can also use it to understand what will be sent to the LLMs. It has the following parameters:

To build upon the example from before:

+begin_src emacs-lisp

(use-package llm-refactoring :init (require 'llm-openai) (setq llm-refactoring-provider (make-llm-openai :key my-openai-key) llm-warn-on-nonfree nil)

+end_src

For all callbacks, the callback will be executed in the buffer the function was first called from. If the buffer has been killed, it will be executed in a temporary buffer instead. ** Main functions

+begin_src emacs-lisp

(defvar-local llm-chat-streaming-prompt nil) (defun start-or-continue-conversation (text) "Called when the user has input TEXT as the next input." (if llm-chat-streaming-prompt (llm-chat-prompt-append-response llm-chat-streaming-prompt text) (setq llm-chat-streaming-prompt (llm-make-chat-prompt text)) (llm-chat-streaming-to-point provider llm-chat-streaming-prompt (current-buffer) (point-max) (lambda ()))))

+end_src

Caution about ~llm-chat-prompt-interactions~ The interactions in a prompt may be modified by conversation or by the conversion of the context and examples to what the LLM understands. Different providers require different things from the interactions. Some can handle system prompts, some cannot. Some require alternating user and assistant chat interactions, others can handle anything. It's important that clients keep to behaviors that work on all providers. Do not attempt to read or manipulate ~llm-chat-prompt-interactions~ after initially setting it up for the first time, because you are likely to make changes that only work for some providers. Similarly, don't directly create a prompt with ~make-llm-chat-prompt~, because it is easy to create something that wouldn't work for all providers. Function calling Note: function calling functionality is currently alpha quality. If you want to use function calling, please watch the =llm= [[https://github.com/ahyatt/llm/discussions][discussions]] for any announcements about changes.

Function calling is a way to give the LLM a list of functions it can call, and have it call the functions for you. The standard interaction has the following steps:

  1. The client sends the LLM a prompt with functions it can call.
  2. The LLM may return which functions to execute, and with what arguments, or text as normal.
  3. If the LLM has decided to call one or more functions, those functions should be called, and their results sent back to the LLM.
  4. The LLM will return with a text response based on the initial prompt and the results of the function calling.
  5. The client can now can continue the conversation.

This basic structure is useful because it can guarantee a well-structured output (if the LLM does decide to call the function). Not every LLM can handle function calling, and those that do not will ignore the functions entirely. The function =llm-capabilities= will return a list with =function-calls= in it if the LLM supports function calls. Right now only Gemini, Vertex, Claude, and Open AI support function calling. Ollama should get function calling soon. However, even for LLMs that handle function calling, there is a fair bit of difference in the capabilities. Right now, it is possible to write function calls that succeed in Open AI but cause errors in Gemini, because Gemini does not appear to handle functions that have types that contain other types. So client programs are advised for right now to keep function to simple types.

The way to call functions is to attach a list of functions to the =llm-function-call= slot in the prompt. This is a list of =llm-function-call= structs, which takes a function, a name, a description, and a list of =llm-function-arg= structs. The docstrings give an explanation of the format.

The various chat APIs will execute the functions defined in =llm-function-call= with the arguments supplied by the LLM. Instead of returning (or passing to a callback) a string, instead an alist will be returned of function names and return values.

The client must then send this back to the LLM, to get a textual response from the LLM based on the results of the function call. These have already been added to the prompt, so the client only has to call the LLM again. Gemini and Vertex require this extra call to the LLM, but Open AI does not.

Be aware that there is no gaurantee that the function will be called correctly. While the LLMs mostly get this right, they are trained on Javascript functions, so imitating Javascript names is recommended. So, "write_email" is a better name for a function than "write-email".

Examples can be found in =llm-tester=. There is also a function call to generate function calls from existing elisp functions in =utilities/elisp-to-function-call.el=. ** Advanced prompt creation The =llm-prompt= module provides helper functions to create prompts that can incorporate data from your application. In particular, this should be very useful for application that need a lot of context.

A prompt defined with =llm-prompt= is a template, with placeholders that the module will fill in. Here's an example of a prompt definition, from the [[https://github.com/ahyatt/ekg][ekg]] package:

+begin_src emacs-lisp

(llm-defprompt ekg-llm-fill-prompt "The user has written a note, and would like you to append to it, to make it more useful. This is important: only output your additions, and do not repeat anything in the user's note. Write as a third party adding information to a note, so do not use the first person.

First, I'll give you information about the note, then similar other notes that user has written, in JSON. Finally, I'll give you instructions. The user's note will be your input, all the rest, including this, is just context for it. The notes given are to be used as background material, which can be referenced in your answer.

The user's note uses tags: {{tags}}. The notes with the same tags, listed here in reverse date order: {{tag-notes:10}}

These are similar notes in general, which may have duplicates from the ones above: {{similar-notes:1}}

This ends the section on useful notes as a background for the note in question.

Your instructions on what content to add to the note:

{{instructions}} ")

+end_src

When this is filled, it is done in the context of a provider, which has a known context size (via ~llm-chat-token-limit~). Care is taken to not overfill the context, which is checked as it is filled via ~llm-count-tokens~. We usually want to not fill the whole context, but instead leave room for the chat and subsequent terms. The variable ~llm-prompt-default-max-pct~ controls how much of the context window we want to fill. The way we estimate the number of tokens used is quick but inaccurate, so limiting to less than the maximum context size is useful for guarding against a miscount leading to an error calling the LLM due to too many tokens.

Variables are enclosed in double curly braces, like this: ={{instructions}}=. They can just be the variable, or they can also denote a number of tickets, like so: ={{tag-notes:10}}=. Tickets should be thought of like lottery tickets, where the prize is a single round of context filling for the variable. So the variable =tag-notes= gets 10 tickets for a drawing. Anything else where tickets are unspecified (unless it is just a single variable, which will be explained below) will get a number of tickets equal to the total number of specified tickets. So if you have two variables, one with 1 ticket, one with 10 tickets, one will be filled 10 times more than the other. If you have two variables, one with 1 ticket, one unspecified, the unspecified one will get 1 ticket, so each will have an even change to get filled. If no variable has tickets specified, each will get an equal chance. If you have one variable, it could have any number of tickets, but the result would be the same, since it would win every round. This algorithm is the contribution of David Petrou.

The above is true of variables that are to be filled with a sequence of possible values. A lot of LLM context filling is like this. In the above example, ={{similar-notes}}= is a retrieval based on a similarity score. It will continue to fill items from most similar to least similar, which is going to return almost everything the ekg app stores. We want to retrieve only as needed. Because of this, the =llm-prompt= module takes in /generators/ to supply each variable. However, a plain list is also acceptable, as is a single value. Any single value will not enter into the ticket system, but rather be prefilled before any tickets are used.

So, to illustrate with this example, here's how the prompt will be filled:

  1. First, the ={{tags}}= and ={{instructions}}= will be filled first. This will happen regardless before we check the context size, so the module assumes that these will be small and not blow up the context.
  2. Check the context size we want to use (~llm-prompt-default-max-pct~ multiplied by ~llm-chat-token-limit~) and exit if exceeded.
  3. Run a lottery with all tickets and choose one of the remaining variables to fill.
  4. If the variable won't make the text too large, fill the variable with one entry retrieved from a supplied generator, otherwise ignore.
  5. Goto 2

    The prompt can be filled two ways, one using predefined prompt template (~llm-defprompt~ and ~llm-prompt-fill~), the other using a prompt template that is passed in (~llm-prompt-fill-text~).

    +begin_src emacs-lisp

    (llm-defprompt my-prompt "My name is {{name}} and I'm here's to say {{messages}}")

    (llm-prompt-fill 'my-prompt my-llm-provider :name "Pat" :messages #'my-message-retriever)

    (iter-defun my-message-retriever () "Return the messages I like to say." (my-message-reset-messages) (while (my-has-next-message) (iter-yield (my-get-next-message))))

    +end_src

    Alternatively, you can just fill it directly:

    +begin_src emacs-lisp

    (llm-prompt-fill-text "Hi, I'm {{name}} and I'm here to say {{messages}}" :name "John" :messages #'my-message-retriever)

    +end_src

    As you can see in the examples, the variable values are passed in with matching keys.