Mirascope / mirascope

LLM abstractions that aren't obstructions
http://mirascope.com/
MIT License
760 stars 51 forks source link

Add support for nested structured in `GeminiTool` #221

Closed willbakst closed 4 months ago

willbakst commented 6 months ago

Is your feature request related to a problem? Please describe. https://github.com/Mirascope/mirascope/blob/269c3339b7e5113686a2435a8b85709abbdcf485/mirascope/gemini/tools.py#L75

Where does this restriction originate from? Is this just something not yet implemented, or is it inherent to gemini?

Originally posted by @barapa in https://github.com/Mirascope/mirascope/discussions/219

Describe the solution you'd like Add support to GeminiTool to properly structured nested definitions to match the Open API 3.0.3 Parameter Object that Gemini supports.

Parameter Object: https://spec.openapis.org/oas/v3.0.3#parameter-object Schema Object: https://spec.openapis.org/oas/v3.0.3#schema-object Reference Object: https://spec.openapis.org/oas/v3.0.3#reference-object

willbakst commented 6 months ago

@barapa do you have any interest in taking this on?

barapa commented 6 months ago

It appears that it doesn't really follow the Open API 3.0.3 spec, but rather a "subset" supported by their FunctionDefinition proto. The hard part is the Schema proto, which I'm reproducing below.

A few challenges I have encountered so far, when trying to convert the pydantic model's model_json_schema to conform to this Schema proto:

WIP PR here: https://github.com/Mirascope/mirascope/pull/222

class Schema(proto.Message):
    r"""The ``Schema`` object allows the definition of input and output data
    types. These types can be objects, but also primitives and arrays.
    Represents a select subset of an `OpenAPI 3.0 schema
    object <https://spec.openapis.org/oas/v3.0.3#schema>`__.

    .. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

    Attributes:
        type_ (google.ai.generativelanguage_v1beta.types.Type):
            Required. Data type.
        format_ (str):
            Optional. The format of the data. This is
            used only for primitive datatypes. Supported
            formats:

             for NUMBER type: float, double
             for INTEGER type: int32, int64
        description (str):
            Optional. A brief description of the
            parameter. This could contain examples of use.
            Parameter description may be formatted as
            Markdown.
        nullable (bool):
            Optional. Indicates if the value may be null.
        enum (MutableSequence[str]):
            Optional. Possible values of the element of Type.STRING with
            enum format. For example we can define an Enum Direction as
            : {type:STRING, format:enum, enum:["EAST", NORTH", "SOUTH",
            "WEST"]}
        items (google.ai.generativelanguage_v1beta.types.Schema):
            Optional. Schema of the elements of
            Type.ARRAY.

            This field is a member of `oneof`_ ``_items``.
        properties (MutableMapping[str, google.ai.generativelanguage_v1beta.types.Schema]):
            Optional. Properties of Type.OBJECT.
        required (MutableSequence[str]):
            Optional. Required properties of Type.OBJECT.
    """

    type_: "Type" = proto.Field(
        proto.ENUM,
        number=1,
        enum="Type",
    )
    format_: str = proto.Field(
        proto.STRING,
        number=2,
    )
    description: str = proto.Field(
        proto.STRING,
        number=3,
    )
    nullable: bool = proto.Field(
        proto.BOOL,
        number=4,
    )
    enum: MutableSequence[str] = proto.RepeatedField(
        proto.STRING,
        number=5,
    )
    items: "Schema" = proto.Field(
        proto.MESSAGE,
        number=6,
        optional=True,
        message="Schema",
    )
    properties: MutableMapping[str, "Schema"] = proto.MapField(
        proto.STRING,
        proto.MESSAGE,
        number=7,
        message="Schema",
    )
    required: MutableSequence[str] = proto.RepeatedField(
        proto.STRING,
        number=8,
    )
willbakst commented 6 months ago

Oh wow super annoying that it doesn't support the spec fully :(

Took a brief look at the PR, looking good! Left some minor comments/questions taking WIP into account :)

willbakst commented 6 months ago

Noticed your comment in the PR (https://github.com/Mirascope/mirascope/pull/222#issue-2286686838)

I'm totally fine with putting this on pause given the difference from the Open API spec if you think it's not worth the time/effort. Otherwise we'll likely still want to raise value errors if we find something that isn't supported (e.g. instead of removing AnyOf we should just throw an error so the user knows it isn't supported rather than silently change things).

Thoughts?

barapa commented 6 months ago

I don't think there is anything inherently necessary about AnyOf an AllOf. In both cases, I think you could re-write them to fit their Schema without losing the semantics. AnyOf appears to just mean they are all nullable. AllOf just means they are all required. However, the conversion isn't trivial.

But, I'm not convinced that with Gemini it wouldn't be more effective to simply prompt the model in JSON mode, providing the spec in the prompt.

They do have a mechanism of converting a function that has a dataclass as a parameter into their required object. Take a look at https://github.com/google-gemini/generative-ai-python/blob/e09e7f242abcabe1bda28168be58a751ccdc5c03/tests/test_content.py#L393.

But it doesn't work with pydantic objects.

barapa commented 6 months ago

I have done some testing (with the vertex ai version of the gemini API) and found that setting it to JSON mode and providing the full json schema in the system prompt works consistently.

@willbakst - do you have any thoughts on how we could create an extractor that doesn't make use of Tools?

willbakst commented 6 months ago

Ok I think we should go down the json path then. In this case, we should handle it like we do for other model providers through a json_mode equivalent and leave the tool calling functionality (with the ValueError) the same except update the error message to mention using json_mode if using nested structured.

For reference, Anthropic doesn't have an official json mode, so we do something there similar to what you'll need to do here:

https://github.com/Mirascope/mirascope/blob/81bfc4784a44719ff81a985e4e7c9f49ce318d23/mirascope/anthropic/calls.py#L243-L264

I'm also noticing that it looks like when we switched from XML to the new beta tools with anthropic we broke json mode for standard tool use, which I will be looking into now separately (it still works for streaming tools though)

https://github.com/Mirascope/mirascope/blob/81bfc4784a44719ff81a985e4e7c9f49ce318d23/mirascope/anthropic/tool_streams.py#L44

A good reference for taking the JSON mode output for extraction in the meantime would be how we handle it for OpenAI:

https://github.com/Mirascope/mirascope/blob/81bfc4784a44719ff81a985e4e7c9f49ce318d23/mirascope/openai/types.py#L160-L173

willbakst commented 4 months ago

v1 has full support for JSON mode with Gemini, which enables using nested schemas for extraction. Their tools still don't allow for nested schemas, but that's an issue with Gemini and not us.