OoriData / OgbujiPT

Client-side toolkit for using large language models, including where self-hosted
Apache License 2.0
103 stars 8 forks source link

Low-level experiments for agent/tool interaction with locally-hosted LLMs #15

Closed uogbuji closed 1 year ago

uogbuji commented 1 year ago

Returning to the topic of agents & tools now that we're favoring bare metal over wrappers such as langchain.

The document LLM Powered Autonomous Agents is a handy SoTA survey, and especially interesting is the listed template from AutoGPT.

There are more and more projects targeting this need, such as the gorilla model, and observations that coding models are especially good for this, so for example this post from the dev of the Nuggt agent project who found great results with Wizcoder-15B.

Tools for strictly influencing LLM generation also seem key here, including:

Tools for paramaterized function calling

See also:

Supersedes #8

uogbuji commented 1 year ago

Coming across PromptLayer gave me an idea: They already have hooks into langchain for their tracking, etc. tasks. We could adapt their hooks to add e.g. exhaustive logging of the agent interactions via LC, and then run one of the LC examples with GPT-4 and use that to borrow the interaction patterns. It might be enough to just do something like PL's OpenAI superclass.

uogbuji commented 1 year ago

Relevant that grammar-based sampling looks about to drop in llama.cpp. "adds an API that takes a serialized context-free grammar to guide and constrain sampling. Also adds a sample Backus-Naur form (BNF)-like syntax in main for specifying a grammar for generations."

Could allow us to control the LLM response with a DSL for tool selection & invocation.

uogbuji commented 1 year ago

On a bit of a philosophical tangent, the development of Google/DeepMind's RT-2 is more poignant than the universal "we're creating WALL-E!" ledes suggest. They say it's a vision-language-action model, which basically puts language at the center of long-standing obstacles in robotics around overall, adaptable higher function.

uogbuji commented 1 year ago

New paper talks about DFA-based logit guidance, and says it's better, and faster than the MS Guidance approach. As I mentioned above llama.cpp already has BNF-based sampling, which would be even more expressive than DFA. I'll try to look into that, this weekend. Interesting Reddit thread: https://www.reddit.com/r/LocalLLaMA/comments/15rb6a4/the_normal_blog_eliminating_hallucinations_fast/

uogbuji commented 1 year ago

Similar to the RT-2 findings mentioned above, this technique, Vision-Language-Action Models (VLAMs), puts an LLM at the heard of a self-driving vehicle, which ends up making strides with many stubborn problems of the latter: "LINGO-1: Exploring Natural Language for Autonomous Driving"

uogbuji commented 1 year ago

Notes for in-memory loading of Gorilla (via the TheBloke GGUF quant.

Download models (Mac-biaed instructions), assuming you're already in a venv:

mkdir -p ~/.local/share/models
cd ~/.local/share/models
# Get the model downloader
curl -O https://raw.githubusercontent.com/uogbuji/OgbujiPT/main/util/download-model.py
# Make sure you have prereqs
pip install requests tqdm
# Download Gorilla
python download-model.py --output . --select="Gorilla-7B.Q4_K_M.gguf" TheBloke/gorilla-7B-GGUF

Build ctransformers for Mac:

CT_METAL=1 pip install "ctransformers>=0.2.24" --no-binary ctransformers

Code to load Gorilla:

from ctransformers import AutoModelForCausalLM
from ogbujipt.llm_wrapper import ctransformer
MY_MODELS = '/Users/uche/.local/share/models'  # Salt to taste
model = AutoModelForCausalLM.from_pretrained(
        f'{MY_MODELS}/TheBloke_gorilla-7B-GGUF',
        model_file='Gorilla-7B.Q4_K_M.gguf',
        model_type="llama",
        gpu_layers=50)
oapi = ctransformer(model=model)
print(oapi('I\'d like to translate from English to French.'))

I think it will needs some finesse to actually prompt it properly.