dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
8.25k stars 417 forks source link

Improved typing when using Pydantic models #1107

Open robinvandernoord opened 4 weeks ago

robinvandernoord commented 4 weeks ago

What behavior of the library made you think about the improvement?

generator = outlines.generate.json(model, MyModel)
answer = generator(prompt)

Answer is now type hinted as str | int | ... but we know it should be MyModel. When I explicitly define answer: MyModel = generator(prompt) it also says the types do not match.

How would you like it to behave?

SequenceGenerator could be generic over the provided pydantic model when using outlines.generate.json, so the return value of __call__ would be the right data structure. This would probably be harder to do for other outlines.generate methods, but for pydantic it would be very nice.

robinvandernoord commented 4 weeks ago

I created a hacky example that roughly does what I mean:

T = typing.TypeVar("T")

class GenericSequenceGenerator(SequenceGenerator, typing.Generic[T]):
    def __call__(self, *args, **kwargs) -> T:
        return typing.cast(
            T,
            super().__call__(*args, **kwargs)
        )

def outlines_generate_json(model, klass: typing.Type[T]) -> GenericSequenceGenerator[T]:
    return typing.cast(
        T,
        outlines.generate.json(model, klass)
    )

If this conflicts with other behavior of generate.json, perhaps it would be wise to split the pydantic generation into something like generate.pydantic?