brainlid / langchain

Elixir implementation of a LangChain style framework.
https://hexdocs.pm/langchain/
Other
510 stars 58 forks source link

OpenAI Function Call Support #23

Closed catethos closed 7 months ago

catethos commented 8 months ago

Hi, I am coming from Python and really like what I have seen so far in the Elixir land. I am thinking about porting one of my LLM application to Elixir and one of the problem is that I rely heavily on OpenAI's function call feature. OpenAI's function call (https://openai.com/blog/function-calling-and-other-api-updates) is a good way to output structure data with LLM and to use external tools. However, to use it wee need to write function parameter specification such as

{
        'name': 'extract_student_info',
        'description': 'Get the student information from the body of the input text',
        'parameters': {
            'type': 'object',
            'properties': {
                'name': {
                    'type': 'string',
                    'description': 'Name of the person'
                },
                'major': {
                    'type': 'string',
                    'description': 'Major subject.'
                }
            }
        }
    }

The python library Instructor (https://github.com/jxnl/instructor) provides a good way to write such specification using another python library Pydantic (https://docs.pydantic.dev/latest/), which is essentially a schema validation library.

Wondering if Elixir provides any way to specify such specification using something more convenience than plain map? If yes, whether this library is a good place to implement such feature?

amokan commented 8 months ago

Elixir provides a number of ways to do what you're asking. Structs are the first that come to mind.

If you want mapping/validation, Ecto also will cover the same areas as Pydantic does.

Cardosaum commented 8 months ago

Might be an example of using related features in Elixir.

catethos commented 8 months ago

I don't know how to add description to ecto schema field, which is essential to the parameters schema sending to OpenAI. I think it might need some Macro knowledge ...

brainlid commented 8 months ago

Hi @catethos! I know what you're talking about with JSONSchema validation libraries to help build parameters. When working on Function, I was looking for something as well because the JS/TS version of LangChain uses something similar.

At the time, I didn't find a matching library to do that and I didn't want to let that keep me from moving forward so I went ahead without it.

Using plain maps works well, but yes, it puts more responsibility on the developer.

I think it can be done using another struct type. Are you willing to be involved as someone who knows what types of data are needed?

Here's the way I currently see it. Using your extract_student_info example:

The way it is today only uses LangChain.Function to define the outer function. The parameters_schema is a bare map that gets passed through to OpenAI.

Function.new!(%{
  name: "extract_student_info",
  description: "Get the student information from the body of the input text",
  parameters_schema: %{
    type: "object",
    properties: %{
      name: %{
        type: "string",
        description: "Name of the person"
      },
      major: %{
        type: "string",
        description: "Major subject."
      }
    },
    required: []
  },
  function: &execute_extract_student_info/2
})

This turns into a Function struct with the name and description validated and cast to struct fields.

The parameters_schema is the one we could add support for. For basic types like "string" and "integer" it's easy. Do you think a type of "object" with nested structures is needed? That's the problem I was dodging. That and the "required" field.

I think it could look something like this:

Function.new!(%{
  name: "extract_student_info",
  description: "Get the student information from the body of the input text",
  parameters: [
    FunctionParameter.new!(%{
      name: "name",
      type: :string,
      description: "Name of Person",
      required: true  # <- optionally declare it as required. Defaults to `false`
    }),
    FunctionParameter.new!(%{
      name: "major",
      type: :string,
      description: "Major subject"
    })
  ],
  function: &execute_extract_student_info/2
})

I don't think we need a JSONSchema supporting library and I don't think it need macros either.

catethos commented 8 months ago

hi @brainlid I am happy to help in anyway :) I do think 'object' type is necessary. In my use case to extract education information from resume, the education field itself is an object

brainlid commented 8 months ago

hi @brainlid I am happy to help in anyway :) I do think 'object' type is necessary. In my use case to extract education information from resume, the education field itself is an object

Oh, I see. Good example. Thanks! I'll think about how we might do that.

catethos commented 8 months ago

I was thinking about ecto schema if only it allows me to add in description for each field

brainlid commented 8 months ago

@catethos Oh, I think I know what you mean. Are you thinking something like this?

embedded_schema do
  field :name, :string, description: "The full name of the student", required: false
  field :age, :integer, description: "The age of the student", required: false
end

If so, that is not possible with Ecto.

Looking at the JS version of LangChain and their data extractions examples, they use zod for defining the JSON schema.

That usage looks like this:

const zodSchema = z.object({
  "person-name": z.string().optional(),
  "person-age": z.number().optional(),
  "person-hair_color": z.string().optional(),
  "dog-name": z.string().optional(),
  "dog-breed": z.string().optional(),
});

However, I don't think descriptions can be set like that either there.

I don't know of a better way to do it than the approach I outlined above.

Or am I way off base and you're saying something else?

brainlid commented 7 months ago

Created LangChain.FunctionParam to express JSONSchema-friendly data structures. Supports basic types, arrays, enums, objects, arrays of objects and nested objects.

Still allows for full control over JSONSchema by providing an override parameters_schema object to full self-describe it.