getsentry / json-schema-diff

Diff changes between JSON schemas
https://docs.rs/json-schema-diff/
15 stars 2 forks source link

split multiple types into anyOf subschema #20

Closed 6293 closed 1 year ago

6293 commented 1 year ago

Closes https://github.com/getsentry/json-schema-diff/issues/16

untitaker commented 1 year ago

This is a completely unrealistic schema (I hope), but I'm just curious, do you think it is possible to handle this case and produce no diff?

        let lhs = json! {{
            "anyOf": [
                {"properties": {"foo": {"type": "integer"}}},
                {"properties": {"foo": {"type": "string"}}}
            ]
        }};
        let rhs = json! {{
            "properties": {
                "foo": {
                    "type": ["integer", "string"]
                }
            }
        }};
untitaker commented 1 year ago

Thank you!

6293 commented 1 year ago

This is a completely unrealistic schema (I hope), but I'm just curious, do you think it is possible to handle this case and produce no diff?

We will need another internal conversion, though we should not convert them into rhs-ish style like the following:

{"properties": {"foo": {"anyOf": [{"type": "integer"}, {"type": "string"}]}}}

Because converting lhs into this style is not always possible. Consider more complex example:

"anyOf": [
    {"properties": {"foo": {"type": "integer"}, "bar": {"type": "integer"}}},
    {"properties": {"foo": {"type": "integer"}, "bar": {"type": "string"}}},
    {"properties": {"foo": {"type": "string"}, "bar": {"type": "integer"}}},
]

In this case, foo can be string only if bar is integer. Thus we are not able to write like

{"properties": {"foo": {"anyOf": [???]}, "bar": {"anyOf": [???]}}}
6293 commented 1 year ago

Better conversion would be from rhs to lhs. e.g.

"anyOf": [{
    "properties": {
        "foo": {
            "type": ["integer", "string"]
        },
        "bar": {
            "type": ["integer", "string"]
        }
    }
}]

to

"anyOf": [
    {"properties": {"foo": {"type": "integer"}, "bar": {"type": "integer"}}},
    {"properties": {"foo": {"type": "integer"}, "bar": {"type": "string"}}},
    {"properties": {"foo": {"type": "string"}, "bar": {"type": "integer"}}},
    {"properties": {"foo": {"type": "string"}, "bar": {"type": "string"}}},
]
untitaker commented 1 year ago

a fun problem to think about :) i am not sure if it is worth solving. if we go with the conversion from rhs to lhs as per your last comment, we should have some sort of limit in place to prevent combinatorial explosion and OOM. some good performance tests would be in https://github.com/getsentry/sentry-kafka-schemas/