Holmeswww / AgentKit

An intuitive LLM prompting framework for multifunctional agents, by explicitly constructing a complex "thought process" from simple natural language prompts.
Creative Commons Attribution 4.0 International
269 stars 26 forks source link

Are there any example apps? #5

Closed nyck33 closed 2 months ago

nyck33 commented 2 months ago

I'm just wondering if you had any apps that showcase the abilities.

Holmeswww commented 2 months ago

Hi,

Please check our paper https://arxiv.org/abs/2404.11483 for showcase of abilities.

We attach the full prompts in the appendix, but please note that we have not yet made the compose-prompt and post-processing functions public yet. However, full prompts should provide a clear sense of what's possible and how we should prompt the LLMs.

rhyswynn commented 2 months ago

We attach the full prompts in the appendix, but please note that we have not yet made the compose-prompt and post-processing functions public yet. However, full prompts should provide a clear sense of what's possible and how we should prompt the LLMs.

Hi there. Do you mean those functions aren't included in this public code, or just haven't been documented?

Why not yet, are they not polished enough for public release, or is there some other reason for a partial release? Thank you!

Holmeswww commented 2 months ago

Hi,

We delay the publication of specific code until paper acceptance to maintain a degree of novelty in case our paper gets rejected from the conference.

All the tools required to implement our paper are in the current codebase. We just did not attach some custom functions for the game we played.

Some important details: As you could see from the prompts in the paper, a lot of them ask for specific Json formats, we used JsonAfterQuery to build a bunch of post-processing instances for the prompts. In addition, we used ComposePromptDB to implement the planner, by storing/retrieving a plan using JsonAfterQuery and ComposePromptDB every turn.

rhyswynn commented 2 months ago

I really do hope you will publish more of these details, it's a lot to understand reading through those links you provided. The webshop is an interesting example, but it seems to be missing a lot of details. Like how to specify and invoke a web page. Thank you for sharing what you have!

Holmeswww commented 2 months ago

Hi,

Operating with the web (parsing and interacting with HTML) is extremely challenging. We think the challenges of parsing HTML is orthogonal to what AgentKit is for.

Our implementation does not interact directly with the web. Instead, it interacts with the WebShop environment, which offers parsed outputs from the web. (See the environment paper here https://arxiv.org/pdf/2207.01206.pdf)

For the record, here's what a web-interaction looks like in WebShop. image

Please let us know if these are the details you are looking for.

rhyswynn commented 2 months ago

Thank you very much for the information, I'm sorry for asking naïve questions!

I am thinking of multiple use cases for this framework that involves querying an api/web page for information to be used in a decision tree exactly like this is described. Isn't OpenAI capable of parsing the text of a web page for meaningful information? There are plenty of libraries for parsing HTML, this could just be a helper function that does the query and returns the parsed text.

My uses cases would probably work best with specialized helper functions that would only work per-node, but might be called multiple times with different parameters. Maybe this belongs in the discussion area now?

Holmeswww commented 2 months ago

Hi,

Yeah, I think parsing HTML seems like a solved problem on its own. Although it remains a problem for current LLM agent research, it's mostly just some engineering work to make it reliable.

For your use cases, it seems that the current version of AgentKit supports all the per-node functions, etc. This could be done by specifying LLM querying/ compose / postprocessing independently for each node.

rhyswynn commented 2 months ago

I added the Azure OpenAI and have that working OK. I'm still reviewing the code and how everything fits together so that I can customize it. I am confident I'll get it worked out, but python is a newer language for me so it takes me a little longer to understand someone else's code and how the module functions work together.

I'm also interested in a hybrid local/vendor LLM approach to help keep token cost lower for some of the more basic queries, or using specialized models for some nodes. I'm really excited about the possibilities!