adriangb / xpresso

A composable Python ASGI web framework
https://xpresso-api.dev/
MIT License
176 stars 4 forks source link

Make response scraping from return values pluggable #107

Open adriangb opened 1 year ago

adriangb commented 1 year ago

https://github.com/adriangb/xpresso/pull/106#discussion_r968491622

While it is currently possible to use a combination of the reponse_factory parameter to Operation and explicitly setting the response schema via the responses parameter to Operation to support non-Pydantic responses, it is super boilerplatey and cumbersome.

adriangb commented 1 year ago

I think there's several use cases/features we'll want to support:

  1. Using the return (func(...) -> T) type annotation to infer the return value. I think this has to be super simple. Maybe something like not supporting Unions or supporting only the first type in a union that does not inherit from Response or that has some sort of marker. I don't think we want to implement some sort of logic for saying "this is the response for a 200 status code" via type annotations. That is better to pass in as a parameter (the responses parameter to Operation).
  2. Customizing serialization of a specific object type while keeping the default serialization in place. This is provided via the response_encoder argument to Operation. It is meant for example to register a custom Enum for serialization. This is obj -> obj serialization with json.dumps() later getting called on the output object.
  3. Taking complete control of the obj -> bytes serialization. This is what something like msgspec would want to do.

A couple ways I can see of doing this.

A global registry

Some module level dict that allows plugins/extensions to register how to serialize their types and how to generate OpenAPI specs for them (we'd need to come up with some sort of Protocol, what the keys are in the dict, etc.). I don't like having global registers like this but it would allow the plugin/extension to "just work" from the user's perspective without extra boilerplate. We could read/load from that registry during app startup so that there is no import-time race conditions (import order).

As far as I can tell though msgspec does not provide any functionality to recursively go from a msgspec.Struct -> dict (other than serializing to json and then loading that back). Pydantic provides BaseModel.dict().

# Xpresso code
class XpressoResponseSchema:
    def generate_openapi(self, model: type) -> ?:
       ...

class XpressoResponseJsonifier:
    def jsonify(self, obj: object) -> object:
       ...

RESPONSES_BY_TYPE = {}  # have Pydantic as the default here or something?

class JsonEncoder:  # already exists
    def encode(self, obj: object, ...) -> object:
        if type(object) in RESPONSES_BY_TYPE:
            return RESPONSES_BY_TYPE[type(object)]. jsonify(obj)
        ...

# Extension code
class MsgSpecResponse:
    def generate_openapi(self, model: type) -> ?:
       ...
    def jsonify(self, obj: object) -> object:
       ...

RESPONSES_BY_TYPE[Struct] = MsgSpecResponse

I think one con here is that we would not be able to support something like msgspec taking complete control of serialization (i.e. going from obj -> bytes). If the response type is something like list[dict[str, Struct]] there's no way we can introspect the type annotation and say "ah there is a Struct somewhere in there, we should have msgspec handle this". Especially since someone could have tuple[Struct, BaseModel] or something. We would always have to rely on a two-step serialization: maybe_non_json_dumpable_object -> json_dumpable_object and json_dumpable_object -> json. In theory this could result in a performance hit but if someone really cares about performance they should just return a Response instance directly instead of relying on implicit conversion. And realistically I expect users to return a Struct or list[Struct], not some deep structure with Struct embedded 10 levels deep, so the performance impact shouldn't be too big.

Some sort of marker on return types.

We'd come up with something like:

# Xpresso code
class XpressoResponseSchema:
    def generate_openapi(self, model: type) -> ?:
       ...

class XpressoResponseJsonifier:
    def jsonify(self, obj: object) -> object:
       ...

class XpressoResponseEncoder:
    def encode(self, obj: object) -> bytes:
        ...

# Extension code
class MsgSpecResponseSchema:
    def generate_openapi(self, model: type) -> ?:
       ...

class MsgSpecResponseEncoder:
    def encode(self, obj: object) -> bytes:
        ...

T = TypeVar("T")
MsgSpecResponse = Annotated[T, MsgSpecResponseSchema(), MsgSpecResponseEncoder()]

# User code
class MyStruct(msgspec.Struct): pass

def endpoint() -> MsgSpecResponse[list[MyStruct]]:
    return [MyStruct()]

The con here compared to the first approach is a lot more user boilerplate. But this would allow msgspec to take full control of serialization (because we can specify it just for that one endpoint/response).

@jcrist would love your thoughts on this.