Figure out how to do an agent/tool interaction with a locally-hosted LLM

uogbuji commented 1 year ago

Using tools (e.g. wikipedia lookup, search engines, arithmetic, etc.) via LLM-based agents is one of the main attractions of packages such as langchain & llama index. People usually use this with the most powerful OpenAI ChatGPT model they can afford, because it requires the LLM-as-agent to make a series of logical deductions and request matching actions. @gekkoid1 and I have seen enough in quick experiments to think that it's not impossible to at least handle some restricted subset of agent tasks via self-hosted LLMs. It will be very hard, and will need very careful design of the agent interaction (prompts, memory, etc), but if we can demonstrate it, it would be extremely useful.

uogbuji commented 1 year ago

A few notes. I've seen reports that even OpenAI's gpt-3.5-turbo-16k struggles with agents (not picking the right tool in many cases, short, unhelpful responses, etc.), at least as implemented by default in Langchain, and people were needing GPT-4 (presumed to be ~4K token context) to reliably use Agents. Daunting, but again it's worth seeing if we can figure something out for local LLM.

Should keep in mind that for very long chains, managing memory/history via vectors/embedding might be needed, e.g. Zep

uogbuji commented 1 year ago

I also, between docs inconsistencies and general weirdness/unpythonicness/over-abstraction/too-much-onion-layering with Langchain, am trending towards a combo of Langchain & llama index. Here's where my experiment is, so far, agent_llama_index_wikipedia.py. Definitely nowhere near yet, but some useful nibbles.

import time
from inspect import currentframe, getframeinfo

from langchain.embeddings.huggingface import HuggingFaceEmbeddings

from langchain import OpenAI

from langchain.agents import initialize_agent
# from langchain.chat_models import ChatOpenAI
# Seems hard-coded to OpenAI live
# from langchain.embeddings import OpenAIEmbeddings

from llama_index import LangchainEmbedding, LLMPredictor
from llama_index.tools.ondemand_loader_tool import OnDemandLoaderTool
from llama_index.readers.wikipedia import WikipediaReader
# from typing import List

# from pydantic import BaseModel

from llama_index import set_global_service_context, ServiceContext

from ogbujipt.config import openai_live, openai_emulation  # noqa

openai_emulation(host='http://192.168.0.73', port='8000')

def print_checkpoint(frame, start_time=None):
    if start_time is not None:
        # We're not just initializing
        print(
            f'Line {getframeinfo(frame).lineno}—' +
            f'Checkpoint after in {time.perf_counter() - start_time:0.2f} ' +
            'seconds')
    new_start_time = time.perf_counter()
    return new_start_time

llm = OpenAI(temperature=0.5)
# llm = ChatOpenAI(temperature=0.5)

start_time = time.perf_counter()

# Download & use the default HF embedding model
# Should be (confirm) sentence-transformers mpnet-v2
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

service_context = ServiceContext.from_defaults(
    llm_predictor=LLMPredictor(llm=llm),
    embed_model=embed_model)

set_global_service_context(service_context)

reader = WikipediaReader()

wiki_tool = OnDemandLoaderTool.from_defaults(
    reader,
    name="Wikipedia Tool",
    description="A tool for loading and querying articles from Wikipedia",
)

start_time = print_checkpoint(currentframe(), start_time)

# Run the tool on its own, which downloads the data from Wikipedia,
# ionitializes a simple vector store index (override this with index_cls),
# and indexes the page. The vector indexing is via HuggingFaceEmbeddings
# Ends up requiring > 2048 tokens of context (thanks to the wiki page size).
# Had to set n_ctx to 4096 in the llama.cpp server instance
result = wiki_tool(
    ["Copenhagen"],
    query_str="What's the arts and culture scene in Copenhagen?")
# tool(["Calabar"], query_str="What's the arts and culture scene in Calabar?")

print('Raw tool response:\n', result)

start_time = print_checkpoint(currentframe(), start_time)

# Now set up tool for use via LangChain agent
lc_tool = wiki_tool.to_langchain_structured_tool(verbose=True)

agent = initialize_agent(
    [lc_tool],
    llm=llm,
    agent="structured-chat-zero-shot-react-description",
    verbose=True
)

start_time = print_checkpoint(currentframe(), start_time)

# Run the LC agent, which should now be able to use wikipedia in context
query = 'Write a brief story based on the arts & culture scene in Copenhagen.'
result = agent.run(query)

# agent.run(tool_input={
#         "pages": ["Copenhagen"],
#         "query_str": "What's the arts and culture scene in Copenhagen?"}
#         )

start_time = print_checkpoint(currentframe(), start_time)

print(result)

uogbuji commented 1 year ago

Superseded by #15

OoriData / OgbujiPT

Figure out how to do an agent/tool interaction with a locally-hosted LLM #8