IBM / jsonsubschema

Tool for checking whether a JSON schema is a subschema of another JSON schema.
Apache License 2.0
82 stars 17 forks source link

isSubschema returns false on added, but not required properties? #14

Closed JCachada closed 9 months ago

JCachada commented 1 year ago

Hello!

I'm seeing some odd behavior and I'm wondering if I'm missing something obvious.

Let's say I have a JSONSchema object like such:

{
        "properties": {
            "error_message": {
                "description": "The error message that will be thrown",
                "title": "Error Message",
                "type": "string",
            },
        },
        "required": ["error_message"],
        "title": "Throw Formatted Error Input",
        "type": "object",
    }
}

And I add a property that is not required, like such:

{
        "properties": {
            "error_message": {
                "description": "The error message that will be thrown",
                "title": "Error Message",
                "type": "string",
            },
            "not_required_field": {
                "description": "A new not required field",
                "title": "Not Required Field",
                "type": "string",
            }
        },
        "required": ["error_message"],
        "title": "Throw Formatted Error Input",
        "type": "object",
    }
}

My expectation in regards to breaking change compatibility is that this would not be considered a breaking change - that is, that isSubschema would return "True" when checking s1,s2, seeing as bodies that are valid for the first schema will still be valid for the second schema. Making them invalid would require adding a not_required_field to the required array in the second schema. Without that, the first schema should be a valid subschema of the second schema.

However, comparing these two schemas makes isSubschema return False.

I have looked through the attached paper and seen:

We observe that JSON schema types null and string are the two
most prevalent schema types present in the dataset. Both types
are fully supported in the subtype checking performed by jsonsubschema as indicated by the color code in Figure 8. The keywords
properties and required for specifying constrains on a JSON object
show up next on the order of the number of use cases. jsonsubschema
fully supports properties while the required keyword is supported
whenever it is not used in union schemas or negated schemas. In
general, disjunction of schemas happens rarely (366 occurrence
among millions of occurrences of other keywords), while negated
schemas are not used at all in our dataset.

Seeing as this is not a union or a negated schema, I would think that this would be a supported use case.

Am I missing something obvious? I did try to run through my debugger through the lib but to no great success.

I appreciate the help (and the work on the lib)! Best

hirzel commented 1 year ago

Hi, thanks for your question! In your example, s2 is a subschema of s1, because every JSON document that is valid for s2 is also valid for s1. However, s1 is not a subschema for s2. Here is a counter-example:

j0 = {
    "error_message": "aarrr",
    "not_required_field": 42,
}

def valid(instance, schema):
    try:
        jsonschema.validate(instance, schema)
    except jsonschema.ValidationError as e:
        return f"False: {e}"
    return True

print(f"valid(j0, s1) {valid(j0, s1)}")
print(f"valid(j0, s2) {valid(j0, s2)}")

That code will print the following:

valid(j0, s1) True
valid(j0, s2) False: 42 is not of type 'string'

Failed validating 'type' in schema['properties']['not_required_field']:
    {'description': 'A new not required field',
     'title': 'Not Required Field',
     'type': 'string'}

On instance['not_required_field']:
    42

In other words, j0 is valid for s1 but not for s2, because s1 ignores the integer field, whereas s2 requires it to be a string.

JCachada commented 1 year ago

I see - that makes sense, in that it matches the logical constraints defined 👍 I will probably need to fork this and add a more "relaxed" mode in the sense that for verifying breaking changes (in a CLI, for instance), I'm not particularly interested on whether ignored fields in s1 will break s2 - it's a fair assumption for my use case that all my clients on s1 will not be passing unsupported parameters (in this case, not_required_field) and I only wish to check if the bodies they are passing (that conform to s1 and are not being ignored) will break. But that's more a general consideration of my practicalities than criticism of the lib 😄

Interesting alteration though is that the error is still returned if I don't define a type in s2, or if I list all potential json types, like such:

"type":["number","string","boolean","object","array", "null"]

In this case, no matter the type that is passed to s1, it should be a valid input for s2. This is not what's happening, if I'm understanding correctly.

andrewhabib commented 11 months ago

Hi @JCachada and @hirzel,

Thank you both for your question(s) and answer!

For the second comment, @JCachada, you are right. What you see is wrong behavior. I checked it and I confirm what you observed. I have already opened a pull request, #16, that addresses this problem. I am confident it will be merged soon and we will get it permanently fixed.

Regards,

shinnar commented 9 months ago

PR #16 has been merged, and is included in the newest release (https://github.com/IBM/jsonsubschema/releases/tag/v0.0.7), which has also been pushed to pypi (https://pypi.org/project/jsonsubschema/0.0.7/)