brainlid / langchain

Elixir implementation of an AI focused LangChain style framework.
https://hexdocs.pm/langchain/
Other
658 stars 71 forks source link

Make Elixir Function optional for LangChain.Function? #143

Open avergin opened 4 months ago

avergin commented 4 months ago

For workflows where we just need structured JSON outputs from the models (e.g. data extraction) through using tools, we may not need to execute any code in the client and send any messages back to the models. For such cases, does it make sense to make the function (Elixir Function) attribute optional for LangChain.Function?

(From Anthropic API Docs)

### How tool use works
Integrate external tools with Claude in these steps:

1. Provide Claude with tools and a user prompt
- Define tools with names, descriptions, and input schemas in your API request.
- Include a user prompt that might require these tools, e.g., “What’s the weather in San Francisco?”

2. Claude decides to use a tool
- Claude assesses if any tools can help with the user’s query.
- If yes, Claude constructs a properly formatted tool use request.
- The API response has a stop_reason of tool_use, signaling Claude’s intent.

3. Extract tool input, run code, and return results
- On your end, extract the tool name and input from Claude’s request.
- Execute the actual tool code client-side.
- Continue the conversation with a new user message containing a tool_result content block.

4. Claude uses tool result to formulate a response
- Claude analyzes the tool results to craft its final response to the original user prompt.

**Note: Steps 3 and 4 are optional. For some workflows, Claude’s tool use request (step 2) might be all you need, without sending results back to Claude.**
brainlid commented 4 months ago

@avergin That's a valid use-case. I plan to create an demo for this, which may identify any remaining issues if they are there. But here's the idea:

Also, data extraction can be done without functions. I'm publishing an article on Monday morning showing this.

Does any of that help?

brainlid commented 4 months ago

@avergin I published the article about using AI to create image ALT tag text and image captions using AI. It's doing data extraction using JSON without functions.

https://fly.io/phoenix-files/using-ai-to-boost-accessibility-and-seo/

avergin commented 4 months ago

Thanks for the article! JsonProcessor seems like a valuable addition for handling JSON outputs.

The idea of validating the function results sounds nice. In this case, will it be similar to the instructor library?

Sometimes, models are not able to generate a valid JSON even in function calls. So does it make sense to have the option for enabling JsonProcessor for function calls as well? Or will the validation process you mentioned cover this in a more elaborate way?

brainlid commented 4 months ago

@avergin Yes, it's similar to instructor in this way.

The idea is that a custom processor (it's really just a function) can validate against a changeset. OR, the tool call can validate against a changeset. I believe the ToolCall will fail and return an error to the LLM if the returned data is not valid JSON. If it's not doing that, then it should.

brainlid commented 4 months ago

A possible option here would be to provide a default Elixir function that does nothing. Setting the function would override the default. :thinking:

mjrusso commented 3 months ago

My favourite Python library for interfacing with LLMs is @jackmpcollins's excellent magentic.

You can do something like this:

from magentic import AssistantMessage, SystemMessage, UserMessage, OpenaiChatModel
from magentic.chat import Chat

from pydantic import BaseModel, Field

class CustomerNumber(BaseModel):
    """The user's Acme Corporation customer number."""

    # Ideally, we'd provide other validation here.
    customer_number: str = Field(description="The user's customer number")

chat = Chat(
    messages = [
        SystemMessage(
            """You are an assistant for Acme Corporation. You will provide customer support,
            but only once the user provides their Acme customer number.
            Solicit the customer number from the user (if one has not already been provided)."""
        )
    ],
    output_types=[str, CustomerNumber], model=OpenaiChatModel("gpt-4o")
)

The part in particular that's relevant to this discussion is the output_types=[str, CustomerNumber] line, which is instructing the model that either a string or a CustomerNumber are acceptable outputs.

For example:

chat = chat.add_user_message("What is the capital of Ontario?").submit()
chat.messages[-1]
AssistantMessage('I can assist you with queries related to Acme Corporation. May I have your Acme customer number to proceed?')
chat = chat.add_user_message("My customer number is 1234567891111111").submit()
chat.messages[-1]
AssistantMessage(CustomerNumber(customer_number='1234567891111111'))

I've found this pattern to be extremely helpful for building complex (many-step) chat-driven agent workflows, because you can directly dump the string response back to the user if you don't get the structured output you were expecting back from the LLM.

The reason I mention this is because I took a quick look at the new message_processors functionality, but it doesn't look like the API directly/easily supports multiple acceptable return values. (Unless I'm reading the code wrong, all processors attempt to run in sequence?)

brainlid commented 3 months ago

Hi @mjrusso! Welcome!

Yes, that is a nice feature. There are two main approaches to do that in this library and, of course, there are other options as well.

We call this "extracting structured data" in general.

(Unless I'm reading the code wrong, all processors attempt to run in sequence?)

Yes, that's the idea. A JSON Processor converts an assistant's text response to JSON and returns errors to the LLM if not.

The next processor could be a custom one for processing the data you expect. It is my intention to also create an EctoProcessor where you provide an Ecto changeset (Elixir's database structure interface, but it can be used without going to a database) to process the now valid JSON.

If a required value is not present or it violates some other rule, the error is returned to the LLM for it to try again.

Another option is to use a Tool/Function where a schema defines the structure you require. This can also be processed through an Ecto changeset to ensure your requirements are met.

Finally, if extracting structured data is your primary need/goal, then you should be aware of https://github.com/thmsmlr/instructor_ex as another option.

mjrusso commented 3 months ago

Thanks Mark for the detailed answer (and the wonderful library :)

I had a suspicion that an EctoProcessor was coming, and am very much looking forward to the addition. I've played around with hybrid LangChain plus Instructor usage (example Livebook: https://gist.github.com/mjrusso/c74803ed7ed49d42f9aefe77b6a62c52), and it totally works, but there's a ton of advantages to baking into LangChain directly.

What I don't think I've generally seen are examples of structured data extraction where multiple return types are considered acceptable. (The example I shared in my previous comment is not great, especially because the LLM natively returns a string. But imagine something like this: output_types=[CustomerNumber, PhoneNumber, CaseNumber, IncidentNumber, StatusQuery, LatestInvoiceRequest]. As far as I know you can't express this natively with Instructor (you'd have to build some other Ecto schema that wraps all these types, which may or may not be easy, but more importantly may be more difficult than necessary for the LLM to reason about). And I think the same general issue applies to the new message processor functionality.

Of course, under-the-hood this is all just sugar over tool calls, so (exactly as you mentioned) expressing each return type as separate tools is a totally viable implementation approach.

But -- does it make sense to support multiple chains of message processors (instead of limiting to a single processor chain)? This would cleanly support the multiple acceptable return types use case without the need to manually define tools. Perhaps not worth the additional complexity if this isn't a common need (as one data point, though, as I've started digging in to building multi-step chat-driven agents, I've found it pretty essential).

brainlid commented 3 months ago

The idea with the message processors is that some simple models are only good at giving a single response. You can't reliably have a follow-up chat with them. Most importantly, they are terrible at function/tool calls. That's the idea I was trying to support here. That I want the LLM to do this one task and it can't do functions.

Personally, I'd approach your example using a more capable model like ChatGPT, Antropic, etc where it has a single function it can call like "account_identifier" that takes an object of any one of those things as separate keys. Instruct the model to prompt the user for the data and call the function. Conceptually, it might look like this account_identifier(%{case_number: "1234"}).

The function would execute the lookup and determine that is a valid CaseNumber, CustomerNumber or not. If not, return an error and let the LLM press on asking for the right data.

Depending on the chat interface, I'd probably stop there with that chain, having obtained the account identification information I needed and start a new chain and system message that loads up the most relevant account information and defines new functions applicable to it's task.

My point is, I don't think it's important to express datatypes specifically that way.

mjrusso commented 3 months ago

Arghhh, I assumed that the message processor chain was implemented under-the-covers with a tool call  🤦

I see the value and think the approach you've taken here is the right one; it definitely makes sense to have a facility that works without tools. (Separately, I'd argue that there is value in adding some entirely unrelated API sugar on top of tools for coercing return types. I'll plan to put together an actual non-handwavy proposal once I start using this for anything serious. Thanks and sorry for the digression!)

brainlid commented 3 months ago

Thanks @mjrusso! I look forward to seeing what you come up with!