Tinche / uapi

https://uapi.threeofwands.com
Apache License 2.0
86 stars 4 forks source link

Unstructuring `IntegerSchema` broken #62

Open salotz opened 1 week ago

salotz commented 1 week ago

Root cause I believe is: https://github.com/python-attrs/cattrs/issues/321

Repro:

import json
from uapi.openapi import SchemaBuilder, IntegerSchema
from uapi.attrschema import build_attrs_schema, Schema
from uapi.openapi import converter as SCHEMA_CONVERTER

import attrs

@attrs.define
class Thing:
    foo: int

schema = build_attrs_schema(Thing, SchemaBuilder())

# FAILS
assert SCHEMA_CONVERTER.unstructure(schema.properties['foo'], IntegerSchema)['type'] == "integer"
salotz commented 1 week ago

https://github.com/Tinche/uapi/blob/main/src/uapi/openapi.py#L63

I understand why Literal kind of makes sense here, since you only want a singular value of the enum, I am beginning to think its not really the intended semantics.

IIUC Literal is meant to be allow for type safety but allow for writing arbitrary strings elsewhere in your code for readability.

E.g. instead of mapping[DEFINED_STRING_KEY] you could directly use mapping['the-actual-key'] and as long as you had the type declaration mapping: Mapping[Literal["the-actual-key"], Any] everything is good.

The "proper" name then would be something like Constant.

In any case since it is an attrs attribute I think it would make more sense here to have a validator that only allows a specific value.

Tinche commented 1 week ago

I can fix the current behavior in cattrs (and will), but let me ask you why is this a problem?

If you have an openapi schema, you generally want to dump it to json so it can be served. This will work properly since the json module will coerce that enum value into the string "integer".

Just unstructuring the data (but not dumping it to json) is a niche use case. I'd say if you want to manipulate the schema in python, it's much better to manipulate it before unstructuring.

salotz commented 1 week ago

That explains the intended usage of that :) Partially I just wasn't sure what to expect and since string types got serialized into strings I thought maybe the integers etc. should be as well. So minimally its just inconsistent.

Further than that its just a matter of modularity, reusability,avoiding unnecessarily serialization roundtrips, and generally the same ethos of cattrs of not having arbitrary decisions made for you. I'll admit I don't have a specific "this blocks X" use case, but I'd object to it being a niche use case. Perhaps from the perspective what uapi needs it for it is.

This is probably a discussion better held in the context of this functionality as its own package/module though. Happy to expand more on what I'm doing with schemas and how it fits in. Briefly it includes code generation and general configuration validation, not just serving OpenAPI schemas.

Tinche commented 1 week ago

I agree with the consistency argument - we should fix cattrs.Converter to handle this consistently. I will fix this for the next version.

However, in my view the preconf converters are and should be free to customize behaviors like this in the name of performance and compatibility. For example, in some cases the new msgspec converter will avoid unstructuring attrs classes if it detects msgspec will unstrcture them identically, because msgspec can do it faster. Similarly the orjson converter will leave datetimes to orjson itself by default.

The ability to do this is pretty cool to me. Maybe we should document this somewhere explicitly though.

salotz commented 1 week ago

I very much agree. It would be great to have optimized marshalling that leverage types.

I guess my only issue then was that in this module there was only the one "converter" that was only intended for marshalling and I was looking for one that did (un)structuring. So again not an issue for this code base really, and more for the still hypothetical one that does only schema interop.

A use case that popped up:

In the jsonschema library I need to provide an unstructured version of the schema to the library. If I am generating this schema from classes then I cannot do that with the current converter.

Tinche commented 5 days ago

Alright, that's an interesting use case. We might make the converter an optional parameter in the future?

In the mean time, you can probably dump the schema straight to JSON and load it back up with a json library - that will probably get you the format you need?

salotz commented 4 days ago

In the mean time, you can probably dump the schema straight to JSON and load it back up with a json library - that will probably get you the format you need?

Yes and not that bad since I only do it once.