dotnet / extensions

This repository contains a suite of libraries that provide facilities commonly needed when creating production-ready applications.
MIT License
2.62k stars 751 forks source link

Arrays and primitive types are not supported in structured output (native or otherwise) #5521

Open kzu opened 3 days ago

kzu commented 3 days ago

If you need an array (or list) of typed values from completion rather than a single object, things fail whether you're using native json schema support or not. These two tests fail, for example:

    [ConditionalFact]
    public virtual async Task CompleteAsync_StructuredOutputArray()
    {
        SkipIfNotEnabled();

        var response = await _chatClient.CompleteAsync<Person[]>("""
            Who are described in the following sentence?
            Jimbo Smith is a 35-year-old software developer from Cardiff, Wales.
            Josh Simpson is a 25-year-old software developer from Newport, Wales.
            """,
            useNativeJsonSchema: _chatClient.Metadata.ProviderName == "openai");

        Assert.Equal(2, response.Result.Length);
        Assert.Contains(response.Result, x => x.FullName == "Jimbo Smith");
        Assert.Contains(response.Result, x => x.FullName == "Josh Simpson");
    }

    [ConditionalFact]
    public virtual async Task CompleteAsync_StructuredOutputPrimitive()
    {
        SkipIfNotEnabled();

        var response = await _chatClient.CompleteAsync<int>("""
            As of today (october 2024), Jimbo Smith is a 35-year-old software developer from Cardiff, Wales.
            Which year was he born in?
            """,
            useNativeJsonSchema: _chatClient.Metadata.ProviderName == "openai");

        Assert.Equal(1989, response.Result);
    }

    [ConditionalFact]
    public virtual async Task CompleteAsync_StructuredOutputEnum()
    {
        SkipIfNotEnabled();

        var response = await _chatClient.CompleteAsync<Architecture>("""
            I'm using a Macbook Pro with an M2 chip. What architecture am I using?
            """,
            useNativeJsonSchema: _chatClient.Metadata.ProviderName == "openai");

        Assert.Equal(Architecture.Arm64, response.Result);
    }

    [ConditionalFact]
    public virtual async Task CompleteAsync_StructuredOutputString()
    {
        SkipIfNotEnabled();

        var response = await _chatClient.CompleteAsync<string>("""
            The software developer, Jimbo Smith, is a 35-year-old software developer from Cardiff, Wales.
            What's his full name?
            """,
            useNativeJsonSchema: _chatClient.Metadata.ProviderName == "openai");

        Assert.Equal("Jimbo Smith", response.Result);
    }

    [ConditionalFact]
    public virtual async Task CompleteAsync_StructuredOutputBool()
    {
        SkipIfNotEnabled();

        var response = await _chatClient.CompleteAsync<bool>("""
            The software developer, Jimbo Smith, is a 35-year-old software developer from Cardiff, Wales.
            Is he a medical doctor?
            """,
            useNativeJsonSchema: _chatClient.Metadata.ProviderName == "openai");

        Assert.False(response.Result);
    }

Except for the first test, the others don't even compile due to a constraint on the T being a class, but there is really no need for that to be the case and it seriously restricts the usefulness of the typed API, forcing the creation of intermediate types just to hold those values (i.e. a superfluous Payload<T>).

The native JSON schema in OpenAI does require the schema type to be an object (not an array or anything else). But this could be solved by having the API itself providing the wrapping object automatically, so that users fall more easily in the proverbial pit of success.

stephentoub commented 3 days ago

cc: @SteveSandersonMS

SteveSandersonMS commented 2 days ago

The native JSON schema in OpenAI does require the schema type to be an object (not an array or anything else). But this could be solved by having the API itself providing the wrapping object automatically, so that users fall more easily in the proverbial pit of success.

We could potentially do that, but it has other risks:

  1. The more this library takes responsibility for controlling the behavior, the more it becomes responsible for cases where it doesn't work or behaves inconsistently across LLMs.
  2. It's harder for developers to work out what's going on when looking at telemetry or debugging, since they don't just see their own object shapes, but rather something else provided by the library

Based on this we've taken the decision (so far at least - it could change) to limit how much magic this library does around structured output. This means:

As such it's up to the developer to augment this behavior further if they want. For scalar outputs, they should provide their own wrapping object. And if the non-native prompt isn't sufficient, they should add their own further prompting.

It's a tricky area and we'll only know if the design is right once people start using this at scale in hundreds of different app scenarios. If we go on to collect evidence from further feedback that a particular change is warranted, we'll certainly be open to making that change. Hope that seems reasonable!

kzu commented 2 days ago

Hm. This is one of the areas that starts to feel like old EF where the LINQ you got was actually a VERY leaky abstraction that didn't hold in a ton of scenarios, requiring knowledge of what worked and what didn't. Here you see a T, where a large number of instances just won't work. Lots of frustration and unnecessary wrappers by everyone. I view it as a VERY low-hanging fruit, and something that I'm almost sure future OpenAI LLMs may likely solve natively too, since it just looks like an oversight (or a bug).

I don't see how automatically wrapping (AND passing that as the schema you tell the LLM to use, either natively or via your existing prompt) can have vastly different results as it already can have with the existing mechanism. My PR shows how you basically pass the same schema (wrapped top-level) in both cases, so the delta in behavior would be the same as it could be today.

Sure, this can end up in a package with a single extension method CompleteAsyncForAny<T>, say, so it's not as if it's a showstopper...