eth-sri / lmql

A language for constraint-guided and efficient LLM programming.
https://lmql.ai
Apache License 2.0
3.57k stars 194 forks source link

LMQL as a LLM/GPT-based agent #96

Open doxav opened 1 year ago

doxav commented 1 year ago

LMQL query with proper scripting (inside & outside query) could simulate a llm/gpt-based (semi) autonomous agent (e.g. Auto-GPT, BabyAGI). What could not be covered by LMQL ?

LMQL can handle interactions with user, memory, some external tools, advanced planning (ex: Tree of Thought is being implemented)... The goals decomposition, plan, thoughts, reasoning, criticism at core of GPT agent like Auto-GPT could be implemented by scripting in the query. LMQL could even call BMTools or Gorilla to highly extend access to third party tools/API. LMQL could certainly create sub-agents like Auto-GPT and asynchronous tasks.

lbeurerkellner commented 1 year ago

Yes definitely. We have already planned some interesting features to support this further, like native support for function calling and tool augmentation.

Would be awesome to collect any suggestions and/or feature requests here, surrounding agents. We are very open to new ideas in this space.

doxav commented 1 year ago

I think top features should be:

  1. Autonomous execution management (choose between manual confirmation vs N autonomous iterations, allow manual early stopping of autonomous iterations without exiting)
  2. Open API budget counter
  3. Vector storage and search in previous prompt/answers
  4. Goals definition and update => does not require new development but agree on best way to configure it
  5. Self-feedback => does not require new development but agree on best way to configure it
  6. Running and managing generated scripts (Python code)
  7. Multiple agents management: clone an agent with same state and role, communication between agents, list agents, allow an agent to kill an agent
  8. Function calling with a model able to call it in context

What would be the new tool augmentation features mentioned ? Tool learning (learn how to use ? when to use ?).

I am a PhD student too but half-time. I am hesitating to use LMQL or Auto-GPT as a development base for my experiments to solve complex industrial problems with AI agents and human-in-the-loop. LMQL would allow to provide more readable and replicable experiments. If you estimate this would not take too much time to develop the above features, I would be happy to share the work if there is a way to take a crash course of LMQL.

lbeurerkellner commented 1 year ago

I would be very open and happy about any form of collaboration. It might be worth discussing scope and/or project philosophy. In general, given the list of things you are describing, I can envision adding a number of language features but would probably also move a couple of things to the library level. We are slowly introducing an LMQL standard library, where such functionality could live. However, if you want to build something very specific/opinionated (which is totally valid), I would rather aim for a separate agent framework. In both cases, I would be happy to collaborate/support things, especially from the LMQL side of things.

Some more specific thoughts below:

  1. Autonomous execution management (choose between manual confirmation vs N autonomous iterations, allow manual early stopping of autonomous iterations without exiting)

I would argue that this can be implemented by a query with a while loop and conditional scripting, that calls an external function to await user confirmation, in case you want explicit consent to do something. Are you thinking of (language-level) features beyond this?

  1. Open API budget counter

This is definitely on the roadmap and something we want to implement as part of #92.

  1. Vector storage and search in previous prompt/answers

I think this is ultimately an external/library task. There are plenty of vector database solutions, which can easily be called via their Python libraries. Curious to hear if you imagine language integrations beyond this?

  1. Goals definition and update => does not require new development but agree on best way to configure it
  2. Self-feedback => does not require new development but agree on best way to configure it

I think both are essentially a form of graph-based prompting (e.g. where multiple hypotheses/tasks are defined, updated and scored in parallel). It would be awesome to support these kind of "reasoning algorithms" more explicitly. Given @LachlanGray's ToT implementation, I think we can already observe a few interesting patterns, that might help inform a more general model.

  1. Running and managing generated scripts (Python code)

It would be interesting to explore model-synthesised tools more. I think one interesting way of supporting this, would be an isolated execution environment for generated Python functions.

  1. Multiple agents management: clone an agent with same state and role, communication between agents, list agents, allow an agent to kill an agent

I was thinking for a while that having queries be generator functions, i.e. functions that can be interrupted and continued at will, would be awesome. State cloning is something we already heavily rely on for branching decoders and it might be worth exposing this in this context. It depends on the agent abstraction, however. In the end you may want to

  1. Function calling with a model able to call it in context

We have the current implementation on https://next.lmql.ai (i.e. the next branch). Maybe check out the multi-tool use example. This focuses on function calling and a simple version of tool discovery.

So overall, I think there definitely is value in building on top of LMQL for you. However, project-wise we are definitely aiming for fairly broad, universal abstractions. This means development can be slower and should be done with care. If instead you want to experiment, iterate quickly and break things if necessary then building on AutoGPT may be a better start for you. This does not preclude LMQL use however, e.g. there are AutoGPT derivates that use LMQL.

Either way, please feel free to reach out on Discord or via e-mail. Happy to chat some more there.

doxav commented 1 year ago

Great thanks for your comprehensive feedback and for expressing your willingness to collaborate. I would be happy to discuss once I get my mind clear. Here are my thoughts on the points raised:

1 Autonomous Execution Management Yes, the implementation using a while loop and conditional scripting seems to cover the basic requirement. In long running execution with parallel executions of different thoughts and actions, the need for parallel and independent progress tracking and local early termination should be accessible to this script. I just wonder if backtracking during state exploration would benefit to access lower level than the LMQL output only. If not, this backtracking need could also be helped by solutions to point 3. In the case of multiple humans in the loop with parallel paths, input waiting should not be blocking for other running queries.

2 API budget counter Great!

3 Vector search mechanism Vector search of past answers/generation from similar tokenization/prompt/parameters could be used to avoid calling LM when not necessary (cashing mecanism), or for transfer learning with in context examples, or multi-turn conversation past context retrieval. Language level integration may improve cashing and pre/post-processing mechanism, this is only intuition.

4 Goals Definition/Update & Self-feedback Your perspective on graph-based prompting is quite intriguing. How do you see it ? In @LachlanGray's ToT, as far as I understand there is no backtracking capabilities and updates. In original ToT paper, backtracking is supposed but not clearly explained. However, they plan to investigate further with MCTS. They might inspire from "Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning".

5 Running and managing generated code (e.g. Python) An isolated environment for executing Python functions is a great idea for system safety. Just wondering limitations in terms of performance, communication, control.

6 Multiple agents management I love your idea of using generator functions for queries, and the concept of state cloning sounds very promising. This kind of agent management would offer flexibility in the management of agents and may create a more dynamic and robust system. Something were missing at the end of your answer ?

7 Function calling in context and tool learning https://next.lmql.ai The current implementation on https://next.lmql.ai certainly serves the purpose. I need to play with it to understand limits in the long run, how it can update tools and usages understanding, search and discover new tools. I would like human experts to be able to be mapped as specialized expert tools in some cases. https://next.lmql.ai

From this exchange LMQL and its associated library seem a robust approach.

What notebooks/resources you recommend to correctly learn LMQL for the basics and towards these goals ?

doxav commented 1 year ago

@lbeurerkellner could you please share your thoughts

lbeurerkellner commented 1 year ago

1 Autonomous Execution Management Yes, the implementation using a while loop and conditional scripting seems to cover the basic requirement. In long running execution with parallel executions of different thoughts and actions, the need for parallel and independent progress tracking and local early termination should be accessible to this script. I just wonder if backtracking during state exploration would benefit to access lower level than the LMQL output only. If not, this backtracking need could also be helped by solutions to point 3. In the case of multiple humans in the loop with parallel paths, input waiting should not be blocking for other running queries.

Branching execution and limited forms of backtracking like beam search are currently only implemented on a token level. This is used to enable branching decoding algorithms. However, many of the required features like branching program execution and state cloning would also generalize to the case where you do not branch on individual tokens but rather on the level of „thoughts“ or actions. Currently, we eliminate branches during decoding based on model likelihood of the generated sequences only. Now in the context of agents or output-dependent decoding schemes like self-consistency, it indeed makes sense to expose this to the program.

One possible feature that relates to this, which we have been discussing internally for a while now, is the idea of a scoring clause, i.e. a custom program expression/statement that continuously computes a score for the current execution branch. Depending on the decoding algorithms in use, this would allow us to only continue execution with e.g. the top-n branches (beam(n=)) or keep around a set of parallel branches all the way through (sample(n=)). In this context, thinking about more advanced decoding algorithms may also make sense.

In general, we have found the separation of decoding, scoring and program execution to work out quite well. Extending on this with more backtracking controls should be possible in a couple of ways, but surely needs some experimentation to find a good implementation that makes sense.

3 Vector search mechanism Vector search of past answers/generation from similar tokenization/prompt/parameters could be used to avoid calling LM when not necessary (cashing mecanism), or for transfer learning with in context examples, or multi-turn conversation past context retrieval. Language level integration may improve cashing and pre/post-processing mechanism, this is only intuition.

In theory there should be some optimization opportunities here, yes. We also toyed with the idea to enable access to embeddings as a special type of query, which would also fit nicely here.

4 Goals Definition/Update & Self-feedback Your perspective on graph-based prompting is quite intriguing. How do you see it ? In @LachlanGray's ToT, as far as I understand there is no backtracking capabilities and updates. In original ToT paper, backtracking is supposed but not clearly explained. However, they plan to investigate further with MCTS. They might inspire from "Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning".

Yes, there are several recent and less recent works that could be really nice decoders in LMQL. What I was eluding to with graph-based prompting, could probably be phrased as a form of MC exploration, although I am not sure you will end up with a lot of probabilistic information, when you just ask the LLM to self-critique. I recently also connected with researchers that work on (S)MC based LLM samplers, and we talked about integrating their work into LMQL. I think it will need some adjustments in the decoder library, but it would be super awesome to provide better infrastructure for more PPL-based approaches on the decoding end. I think we could provide a lot of backend and optimization functionality to many ideas popping up in this space now.

6 Multiple agents management I love your idea of using generator functions for queries, and the concept of state cloning sounds very promising. This kind of agent management would offer flexibility in the management of agents and may create a more dynamic and robust system. Something were missing at the end of your answer ?

Right, I did not finish the draft there. Queries as generators makes a lot of sense if you assume that you want to keep the entire interaction history as part of the prompt. From what I have seen however, some agent implementations prompt the model from scratch on each step (more like chaining and less like a continuous conversational approach). We have seen this with the AutoGPT folks, and they were asking about rewriting functionality (e.g. summarize and compress the context at a certain point during execution). So this may also be an interesting direction to explore, wrt. language features.

7 Function calling in context and tool learning https://next.lmql.ai The current implementation on https://next.lmql.ai certainly serves the purpose. I need to play with it to understand limits in the long run, how it can update tools and usages understanding, search and discover new tools. I would like human experts to be able to be mapped as specialized expert tools in some cases. https://next.lmql.ai

From this exchange LMQL and its associated library seem a robust approach.

What notebooks/resources you recommend to correctly learn LMQL for the basics and towards these goals ?

Re learning LMQL I can recommend reading our docs as a starter at https://docs.lmql.ai. We are also usually quite available in the Discord to help with things. Otherwise, I would suggest to implement some toy projects, and reach out when things break.

I also want to provide more of an internals overview to get started with language development, but that’s not ready yet. The playground can be useful to see what’s going on internally though.