Open avergin opened 4 months ago
@avergin That's a valid use-case. I plan to create an demo for this, which may identify any remaining issues if they are there. But here's the idea:
LLMChain.run(chain, mode: :until_success)
and it should stop after getting a successful response but NOT return it to the LLM.Also, data extraction can be done without functions. I'm publishing an article on Monday morning showing this.
Does any of that help?
@avergin I published the article about using AI to create image ALT tag text and image captions using AI. It's doing data extraction using JSON without functions.
https://fly.io/phoenix-files/using-ai-to-boost-accessibility-and-seo/
Thanks for the article! JsonProcessor
seems like a valuable addition for handling JSON outputs.
The idea of validating the function results sounds nice. In this case, will it be similar to the instructor library?
Sometimes, models are not able to generate a valid JSON even in function calls. So does it make sense to have the option for enabling JsonProcessor
for function calls as well? Or will the validation process you mentioned cover this in a more elaborate way?
@avergin Yes, it's similar to instructor in this way.
The idea is that a custom processor (it's really just a function) can validate against a changeset. OR, the tool call can validate against a changeset. I believe the ToolCall will fail and return an error to the LLM if the returned data is not valid JSON. If it's not doing that, then it should.
A possible option here would be to provide a default Elixir function that does nothing. Setting the function would override the default. :thinking:
My favourite Python library for interfacing with LLMs is @jackmpcollins's excellent magentic.
You can do something like this:
from magentic import AssistantMessage, SystemMessage, UserMessage, OpenaiChatModel
from magentic.chat import Chat
from pydantic import BaseModel, Field
class CustomerNumber(BaseModel):
"""The user's Acme Corporation customer number."""
# Ideally, we'd provide other validation here.
customer_number: str = Field(description="The user's customer number")
chat = Chat(
messages = [
SystemMessage(
"""You are an assistant for Acme Corporation. You will provide customer support,
but only once the user provides their Acme customer number.
Solicit the customer number from the user (if one has not already been provided)."""
)
],
output_types=[str, CustomerNumber], model=OpenaiChatModel("gpt-4o")
)
The part in particular that's relevant to this discussion is the output_types=[str, CustomerNumber]
line, which is instructing the model that either a string or a CustomerNumber
are acceptable outputs.
For example:
chat = chat.add_user_message("What is the capital of Ontario?").submit()
chat.messages[-1]
AssistantMessage('I can assist you with queries related to Acme Corporation. May I have your Acme customer number to proceed?')
chat = chat.add_user_message("My customer number is 1234567891111111").submit()
chat.messages[-1]
AssistantMessage(CustomerNumber(customer_number='1234567891111111'))
I've found this pattern to be extremely helpful for building complex (many-step) chat-driven agent workflows, because you can directly dump the string response back to the user if you don't get the structured output you were expecting back from the LLM.
The reason I mention this is because I took a quick look at the new message_processors
functionality, but it doesn't look like the API directly/easily supports multiple acceptable return values. (Unless I'm reading the code wrong, all processors attempt to run in sequence?)
Hi @mjrusso! Welcome!
Yes, that is a nice feature. There are two main approaches to do that in this library and, of course, there are other options as well.
We call this "extracting structured data" in general.
(Unless I'm reading the code wrong, all processors attempt to run in sequence?)
Yes, that's the idea. A JSON Processor converts an assistant's text response to JSON and returns errors to the LLM if not.
The next processor could be a custom one for processing the data you expect. It is my intention to also create an EctoProcessor where you provide an Ecto changeset (Elixir's database structure interface, but it can be used without going to a database) to process the now valid JSON.
If a required value is not present or it violates some other rule, the error is returned to the LLM for it to try again.
Another option is to use a Tool/Function where a schema defines the structure you require. This can also be processed through an Ecto changeset to ensure your requirements are met.
Finally, if extracting structured data is your primary need/goal, then you should be aware of https://github.com/thmsmlr/instructor_ex as another option.
Thanks Mark for the detailed answer (and the wonderful library :)
I had a suspicion that an EctoProcessor was coming, and am very much looking forward to the addition. I've played around with hybrid LangChain plus Instructor usage (example Livebook: https://gist.github.com/mjrusso/c74803ed7ed49d42f9aefe77b6a62c52), and it totally works, but there's a ton of advantages to baking into LangChain directly.
What I don't think I've generally seen are examples of structured data extraction where multiple return types are considered acceptable. (The example I shared in my previous comment is not great, especially because the LLM natively returns a string. But imagine something like this: output_types=[CustomerNumber, PhoneNumber, CaseNumber, IncidentNumber, StatusQuery, LatestInvoiceRequest]
. As far as I know you can't express this natively with Instructor (you'd have to build some other Ecto schema that wraps all these types, which may or may not be easy, but more importantly may be more difficult than necessary for the LLM to reason about). And I think the same general issue applies to the new message processor functionality.
Of course, under-the-hood this is all just sugar over tool calls, so (exactly as you mentioned) expressing each return type as separate tools is a totally viable implementation approach.
But -- does it make sense to support multiple chains of message processors (instead of limiting to a single processor chain)? This would cleanly support the multiple acceptable return types use case without the need to manually define tools. Perhaps not worth the additional complexity if this isn't a common need (as one data point, though, as I've started digging in to building multi-step chat-driven agents, I've found it pretty essential).
The idea with the message processors is that some simple models are only good at giving a single response. You can't reliably have a follow-up chat with them. Most importantly, they are terrible at function/tool calls. That's the idea I was trying to support here. That I want the LLM to do this one task and it can't do functions.
Personally, I'd approach your example using a more capable model like ChatGPT, Antropic, etc where it has a single function it can call like "account_identifier" that takes an object of any one of those things as separate keys. Instruct the model to prompt the user for the data and call the function. Conceptually, it might look like this account_identifier(%{case_number: "1234"})
.
The function would execute the lookup and determine that is a valid CaseNumber, CustomerNumber or not. If not, return an error and let the LLM press on asking for the right data.
Depending on the chat interface, I'd probably stop there with that chain, having obtained the account identification information I needed and start a new chain and system message that loads up the most relevant account information and defines new functions applicable to it's task.
My point is, I don't think it's important to express datatypes specifically that way.
Arghhh, I assumed that the message processor chain was implemented under-the-covers with a tool call 🤦
I see the value and think the approach you've taken here is the right one; it definitely makes sense to have a facility that works without tools. (Separately, I'd argue that there is value in adding some entirely unrelated API sugar on top of tools for coercing return types. I'll plan to put together an actual non-handwavy proposal once I start using this for anything serious. Thanks and sorry for the digression!)
Thanks @mjrusso! I look forward to seeing what you come up with!
For workflows where we just need structured JSON outputs from the models (e.g. data extraction) through using tools, we may not need to execute any code in the client and send any messages back to the models. For such cases, does it make sense to make the
function
(Elixir Function) attribute optional forLangChain.Function
?(From Anthropic API Docs)