davishmcclurg / json_schemer

JSON Schema validator. Supports drafts 4, 6, 7, 2019-09, 2020-12, OpenAPI 3.0, and OpenAPI 3.1.
MIT License
408 stars 64 forks source link

Invalid error output with OpenApi 3.1 schema #157

Closed ekzobrain closed 9 months ago

ekzobrain commented 1 year ago

Try this file. It has just one error at /servers/0/description , but validation produces a bunch of error even inside other root keywords. Tested against unreleased 2.1 branch. When changing description type from number to string at /servers/0/description - document passes validation.

oas-order.json

davishmcclurg commented 1 year ago

I think the problem here is unevaluatedProperties, which is used in OpenAPI meta/document schemas. It's a bit of a mess. See:

I don't think it's a new issue in the 2.1.0 branch, so I probably won't hold that up.

I need to read more about it, but I think issue is whether unevaluatedProperties can consider sibling keyword evaluations if those keywords failed. I tried out using all sibling keyword evaluations and it resulted in better errors:

irb(main):001:0> JSONSchemer.openapi(JSON.parse(File.read('/Users/dharsha/Downloads/oas-order.json'))).validate(output_format: 'basic', resolve_enumerators: true)
=>
{"valid"=>false,
 "keywordLocation"=>"",
 "absoluteKeywordLocation"=>"json-schemer://openapi31/schema-base#",
 "instanceLocation"=>"",
 "error"=>"value at root does not match schema",
 "errors"=>
  [{"valid"=>false,
    "keywordLocation"=>"/allOf/0/then/$ref/properties/servers/items/$ref/properties/description/type",
    "absoluteKeywordLocation"=>"https://spec.openapis.org/oas/3.1/schema/2022-10-07#/$defs/server/properties/description/type",
    "instanceLocation"=>"/servers/0/description",
    "error"=>"value at `/servers/0/description` is not a string"}]
exterm commented 11 months ago

I'm experiencing problems with unevaluatedProperties too. I verified that the attached fix works for me.

davishmcclurg commented 11 months ago

I'm experiencing promblems with unevaluatedProperties too. I verified that the attached fix works for me.

Thanks for trying it out, @exterm. Just to be clear, the problem you are experiencing is with the extra unevaluatedProperties errors?

exterm commented 11 months ago

I get extraneous

value at `/type` does not match schema

errors when something else is broken. My setup is a bit complex... let me explain.

I'm defining a custom metaschema. I've been able to simplify the metaschema to a much shorter one that still shows the same behavior:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://app.example.com/schemas/properties/v1/metaschema",
    "$vocabulary": {
        "https://json-schema.org/draft/2020-12/vocab/applicator": true,
        "https://json-schema.org/draft/2020-12/vocab/validation": true
    },
    "$dynamicAnchor": "meta",

    "title": "Core and Validation specifications meta-schema",
    "type": "object",
    "properties": {
      "properties": {
          "type": "object",
          "additionalProperties": {
              "$dynamicRef": "#meta"
          }
      },
      "type": { "$ref": "#/$defs/simpleTypes" }
    },
    "required": ["type"],
    "$defs": {
      "simpleTypes": {
        "enum": [
            "array",
            "boolean",
            "integer",
            "null",
            "number",
            "object",
            "string"
        ]
      }
  },
    "unevaluatedProperties": false
}

If I use the above metaschema to validate this schema:

{
        "type": "object",
        "properties": {
          "foo": {
            "type": "object",
            "foobar": 1,
          },
        },
      }

I get the expected errors

        "value at `/properties/foo/foobar` does not match schema",
        "value at `/properties` does not match schema",

but also

        "value at `/type` does not match schema",

Which goes away with your patch.

davishmcclurg commented 11 months ago

@exterm thanks for the extra info!

I think this is the simplest example that shows the issue:

schemer = JSONSchemer.schema({
  'properties' => {
    'x' => {
      'type' => 'integer'
    }
  },
  'unevaluatedProperties' => false
})
schemer.validate({ 'x' => 'invalid' }).map { _1.fetch_values('schema_pointer', 'error') }
# =>
# [["/properties/x", "value at `/x` is not an integer"],
#  ["/unevaluatedProperties", "value at `/x` does not match schema"]]

And the same behavior with a valid property:

schemer = JSONSchemer.schema({
  'properties' => {
    'x' => {
      'type' => 'integer'
    },
    'y' => {
      'type' => 'string'
    }
  },
  'unevaluatedProperties' => false
})
schemer.validate({ 'x' => 'invalid', 'y' => 'valid' }).map { _1.fetch_values('schema_pointer', 'error') }
# =>
# [["/properties/x", "value at `/x` is not an integer"],
#  ["/unevaluatedProperties", "value at `/x` does not match schema"],
#  ["/unevaluatedProperties", "value at `/y` does not match schema"]]

The issue is unevaluatedProperties only considers a property to be "evaluated" if it validated successfully. Since properties/x (in the example above) failed to validate, properties itself also failed, which means unevaluatedProperties doesn't consider any of its children "evaluated".

There's some debate about this interpretation, but from my understanding of the spec, it seems correct. Unfortunately, it produces lots of unhelpful errors, so I'm thinking it may make sense to treat failing properties as evaluated. I need to double check that won't affect the overall validation result, though.

exterm commented 11 months ago

I need to double check that won't affect the overall validation result, though.

Reading through the whole discussion, it seems like the consensus reached in the end is that both interpretations are valid and don't change the validation result, as persisted in the ADR.

Also, at least one other implementation seems to have moved to the new interpretation (JsonSchema.Net).

I particularly found this post by Ben Hutton helpful in understanding the problem; it seems like the spec is a bit confused about itself here. I appreciate they found a conclusion to this though and ended up explicitly allowing both interpretations.

davishmcclurg commented 11 months ago

Reading through the whole discussion, it seems like the consensus reached in the end is that both interpretations are valid and don't change the validation result, as persisted in the ADR.

👍 agreed. Thanks for linking to the ADR—I hadn't seen that. It not changing the validation result made sense to me when I thought about it like this:

The overall validation result shouldn't be affected, since the only additional keywords that it considers are failed ones (meaning the entire schema fails regardless of the unevaluated keys/items).

I opened a pull request with the change (including unevaluatedItems as well): https://github.com/davishmcclurg/json_schemer/pull/164

@exterm do you mind checking it out again to make sure it behaves like you expect?

I included a test to describe some of the inconsistent behavior noted in the discussion. Validation result doesn't change, just which keys are considered "unevaluated".

exterm commented 11 months ago

The fix in that PR seems different from the one that you had previously proposed.

Here's a more concise example of the problem that I am seeing:

    test "it" do
      schema = {
        "$schema": "https://json-schema.org/draft/2020-12/schema",

        "type": "object",
        "properties": {
          "foo": { "type": "string" },
          "bar": { "type": "string" },
        },
        "unevaluatedProperties": false,
      }

      data = {
        "foo": "hello world",
        "bar": {},
      }

      errors = JSONSchemer.schema(schema).validate(data).to_a

      expected_errors = [
        [{ "type" => "string" }, "value at `/bar` is not a string"],
      ]

      assert_equal expected_errors, errors.map { |e| [e["schema"], e["error"]] }
    end
Minitest::Assertion: --- expected
+++ actual
@@ -1,3 +1,4 @@
 #<Set:
  {[{"type"=>"string"}, "value at `/bar` is not a string"],
+  [false, "value at `/foo` does not match schema"],
+  [false, "value at `/bar` does not match schema"]}>

(I actually don't care about "value at `/bar` does not match schema", but I'd like it to stop complaining about /foo)

This is not fixed on https://github.com/davishmcclurg/json_schemer/pull/164.

It is however fixed in https://github.com/davishmcclurg/json_schemer/commit/3bbefa3247ac6be896b23261ea4553b2697d1b32 and https://github.com/davishmcclurg/json_schemer/commit/719a7da75b91ea4f50a3a6b90e523691e6b215c5 .

davishmcclurg commented 11 months ago

This is not fixed on #164.

Hm, it seems like it's working when I try it out your example:

repos/json_schemer unevaluated-properties % bin/console --simple-prompt
?> schema = {
?>   "$schema": "https://json-schema.org/draft/2020-12/schema",
?>
?>   "type": "object",
?>   "properties": {
?>     "foo": { "type": "string" },
?>     "bar": { "type": "string" },
?>   },
?>   "unevaluatedProperties": false,
>> }
=> {:$schema=>"https://json-schema.org/draft/2020-12/schema", :type=>"object", :properties=>{:foo=>{:type=>"string"}, :bar=>{:type=>"string"}}, :unevaluatedProperties=>false}
?> data = {
?>   "foo": "hello world",
?>   "bar": {},
>> }
=> {:foo=>"hello world", :bar=>{}}
>> errors = JSONSchemer.schema(schema).validate(data).to_a
=>
[{"data"=>{},
...
>> errors.map { |e| [e["schema"], e["error"]] }
=> [[{"type"=>"string"}, "value at `/bar` is not a string"]]

Maybe I'm missing something, though. What's your process for trying out the branch?

exterm commented 11 months ago

OK, I'm thoroughly confused now. I couldn't reproduce the problem in that simple example today either. However, the problem still existed in my real world code. I started from the real world code again and simplified as much as I could. It seems that this might be related to $ref somehow. I then checked out your branch and added my example as an additional test case below the ones you added in the branch. This is the test case I added:

  def test_inconsistent_errors_with_refs
    schema = {
      "$schema" => "https://json-schema.org/draft/2020-12/schema",
      "allOf" => [
        { "$ref" => "#/$defs/rules" },
      ],
      "$defs" => {
        "rules" => {
          "type" => "object",
          "properties" => {
            "foo" => { "type" => "string" },
            "bar" => { "type" => "string" },
          },
        },
      },
      "unevaluatedProperties" => false,
    }

    data = {
      "foo" => "hello world",
      "bar" => 1,
    }

    errors = JSONSchemer.schema(schema).validate(data).to_a

    expected_errors = [
      [{ "type" => "string" }, "value at `/bar` is not a string"],
    ]

    assert_equal expected_errors, errors.map { |e| [e["schema"], e["error"]] }
  end

Which still gives me an error saying "value at `/foo` does not match schema". The error disappears when I remove "unevaluatedProperties": false. However, this doesn't seem to be fixed with any of your earlier commits either. I was previously testing within my Rails app and something seems to have interfered. Maybe this is expected behavior somehow and I'm just misinterpreting the spec?

davishmcclurg commented 11 months ago

It seems that this might be related to $ref somehow.

I think you're running into one of the issues they described with this interpretation of unevaluatedProperties. See examples: https://github.com/json-schema-org/json-schema-spec/issues/1172#issuecomment-1053637678 and https://github.com/json-schema-org/json-schema-spec/issues/1172#issuecomment-1053670445

The gist is introducing an intermediate schema (eg, $ref or allOf) changes which keys are considered "evaluated".

The fact that you ran into this problem so quickly and that it was confusing makes me question whether this is a good approach.

exterm commented 11 months ago

Oh yes, you're right - I completely missed the "subschemas don't forward annotations" part when I read through the discussion the first time. Thank you for the links.

The fact that you ran into this problem so quickly and that it was confusing makes me question whether this is a good approach.

Yes, fully agreed.

It seems this is a place where the spec isn't very mature. Comments in that discussion about revising the whole annotation mechanism make that very obvious.

Not sure what to do for now. I know what behavior I'd like to see, but it doesn't seem to be possible to derive it from the spec, and I'm not even sure it'd be compatible with the spec.

On the other hand, it seems that error reporting isn't very clearly specced at all and there is no test suite for it, so as long as true / false validation results are preserved, there is some room for interpretation?

To fully resolve this, and only report "does not match schema" errors for properties that actually can't be successfully evaluated, it seems that subschemas need to forward their annotations? I wouldn't be surprised if that broke a ton of other functionality that assume the previous behavior.

davishmcclurg commented 9 months ago

I went ahead with the imperfect fix. It should reduce extraneous errors for common cases (ie, no intermediate schema) and that seems helpful. I'll revisit if it's confusing for people in practice.

It'll be released shortly in 2.2.0.