brandur / json_schema

A JSON Schema V4 and Hyperschema V4 parser and validator.
MIT License
230 stars 45 forks source link

Upgrade from v 0.19 -> v 0.20 results in infinite loop for `lookup_schema` #112

Closed jessieay closed 5 years ago

jessieay commented 5 years ago

Hi! We recently upgraded our version of json_schema from v 0.19 -> v 0.20 and a related prmd command no longer works with the very helpful error message of "stack level too deep"

I was able to pinpoint the error to this method call: https://github.com/interagent/prmd/blob/d4908cba5153a76cced9979e6ec17bd217b96865/lib/prmd/commands/verify.rb#L68

Which is referencing this json_schema method: https://github.com/brandur/json_schema/blob/9c4656774a7c7d22a6f466932b34fd47d67c88dc/lib/json_schema/document_store.rb#L25

I dropped in a debugger to follow the method calls and it gets to here

json_schema/reference_expander.rb @ line 332 JsonSchema::ReferenceExpander#traverse_schema:

    329: def traverse_schema(schema)
    330:   add_reference(schema)
    331:
 => 332:   schema_children(schema) do |subschema|
    333:     if subschema.reference && !subschema.expanded?
    334:       dereference(subschema, [])
    335:     end
    336:
    337:     if !subschema.reference
    338:       traverse_schema(subschema)
    339:     end
    340:   end
    341:
    342:   # after finishing a schema traversal, find all clones and re-hydrate them
    343:   if schema.original?
    344:     schema.clones.each do |clone_schema|
    345:       parent = clone_schema.parent
    346:       clone_schema.copy_from(schema)
    347:       clone_schema.parent = parent
    348:     end
    349:   end
    350: end

and then it goes back to lookup_schema

I noticed that the schema_children method was refactored between versions 0.19 and 0.20 so my guess is that the source of the issue is somewhere in there but haven't been able to pinpoint just yet.

Let me know if I can provide any more info.

kytrinyx commented 5 years ago

Hi, yeah this part of JSON schema is really complex!

Would you happen to be able to provide an example of a schema that fails? (E.g. a small subset of your schema that gets the error when you load it?)

I can take a look at this if there's a sample schema to debug with, but without it I can't even imagine where to begin. Here's an example of another issue where we chased down a recursive problem: https://github.com/brandur/json_schema/issues/100

jessieay commented 5 years ago

Thanks for getting back to us on this!

Did some more digging, and one thing we learned is that this error does not occur in 0.20.3 so the culprit is somewhere in this diff: https://github.com/brandur/json_schema/compare/v0.20.3..v0.20.4

We were able to reproduce with the following schemas:

require 'json_schema'
store = JsonSchema::DocumentStore.new

[ 
  "schema.json", # via https://github.com/interagent/prmd/blob/master/schemas/schema.json
  "hyper-schema.json" # via https://github.com/interagent/prmd/blob/master/schemas/hyper-schema.json
].each do |file|
  data = JSON.parse(File.read(file))
  schema = JsonSchema::Parser.new.parse!(data)
  schema.expand_references!(store: store)
  store.add_schema(schema)
end
kytrinyx commented 5 years ago

@jessieay Thanks, narrowing this down to two files is a tremendous help!

I'm about to get on a plane, but will dig into this as soon as I can (probably within the next 4-5 days).

jessieay commented 5 years ago

Thanks @kytrinyx ! I will let you know if I am able to narrow it down any further.

For now we are happy that we have everything working with the version locked at 0.20.3 so there is no rush on our side but I look forward to hearing what you find when you are able to look into this.

jessieay commented 5 years ago

Hi there! Just checking in on this to see if there are any updates.

Like I said before, our teams have all downgraded our json_schema version to 0.20.3 so we are in a good state but we would like to upgrade when we can.

kytrinyx commented 5 years ago

@jessieay So sorry! Life piled on a bunch of things. I will try to get this done this week!

kytrinyx commented 5 years ago

I've narrowed the problem down to the following:

Schema 1:

{
    "id": "http://json-schema.org/draft-04/schema#",
    "$schema": "http://json-schema.org/draft-04/schema#",
    "description": "Core schema meta-schema",
    "definitions": {
        "schemaArray": {
            "type": "array",
            "minItems": 1,
            "items": { "$ref": "#" }
        }
    }
}

Schema 2:

{
    "$schema": "http://json-schema.org/draft-04/hyper-schema#",
    "id": "http://json-schema.org/draft-04/hyper-schema#",
    "title": "JSON Hyper-Schema",
    "allOf": [
        {
            "$ref": "http://json-schema.org/draft-04/schema#"
        }
    ]
}

This uses the initialization code from @jessieay above:

require_relative 'lib/json_schema'
store = JsonSchema::DocumentStore.new

[
  "schema.json", # schema 1
  "hyper-schema.json" # schema 2
].each do |file|
  data = JSON.parse(File.read(file))
  schema = JsonSchema::Parser.new.parse!(data)
  schema.expand_references!(store: store)
  store.add_schema(schema)
end
kytrinyx commented 5 years ago

Ok, interestingly, I've not yet managed to create a failing test in the reference_expander. Here's what I wrote (I expect it to blow up with a stack level too deep before hitting the assertion).

I need to put this away and come back to it with fresh eyes in case I'm missing something ridiculously obvious.

  it "expands a schema with a reference to an external schema in a oneOf array" do
    sample1 = {
      "$schema" => "http://json-schema.org/draft-04/schema#",
      "id" => "http://json-schema.org/draft-04/schema#",
      "definitions" => {
          "schemaArray" => {
              "type" => "array",
              "minItems" => 1,
              "items" => { "$ref" => "#" }
          }
      }
    }
    schema1 = JsonSchema::Parser.new.parse!(sample1)
    schema1.uri = "http://json-schema.org/draft-04/schema#"

    sample2 = {
      "$schema" => "http://json-schema.org/draft-04/hyper-schema#",
      "id" => "http://json-schema.org/draft-04/hyper-schema#",
      "allOf" => [
          {
              "$ref": "http://json-schema.org/draft-04/schema#"
          }
      ]
    }
    schema2 = JsonSchema::Parser.new.parse!(sample2)
    schema2.uri = "http://json-schema.org/draft-04/hyper-schema#"

    store = JsonSchema::DocumentStore.new
    store.add_schema(schema1)
    store.add_schema(schema2)

    expander = JsonSchema::ReferenceExpander.new
    expander.expand!(schema1, store: store)
    expander.expand!(schema2, store: store)

    assert_equal :something, :something_else
  end
jessieay commented 5 years ago

@kytrinyx Thanks for digging into this!

I copy-pasted your test into my fork and confirmed that it passed for me too.

Looking at the sample code I gave above, the only difference between your test and the failure mode appeared to be that your test was setting uri on both schemas before passing them to add_schema and expand!.

So, I commented out schema1.uri = and schema2.uri =.

Once I did that, I was able to reproduce the stack level too deep error when running ruby -Ilib -Itest test/json_schema/reference_expander_test.rb

My failing spec is here: https://github.com/brandur/json_schema/compare/master...jessieay:master

kytrinyx commented 5 years ago

@jessieay great find, thanks for digging into that. I'll take another look at this tomorrow or the next day.

kytrinyx commented 5 years ago

@jessieay Get this. The biggest problem in my test:

"$ref": "http://json-schema.org/draft-04/schema#"

instead of

"$ref" => "http://json-schema.org/draft-04/schema#"

It doesn't handle symbol keys (as indeed, it shouldn't). So when I removed the lines that were setting the URI it was still not failing.

jessieay commented 5 years ago

My version of the test uses hash rocket for ref (not symbol) and fails!

Edit: github doesn't seem to know how to link to a specific line in a fork diff, but check out my fixture data at https://github.com/brandur/json_schema/compare/master...jessieay:master

2nd Edit: oh wait, on a 3rd read I now wonder if what you were saying was that you were getting a false positive because of the symbol key in your fixture...

Let me know if there is anything else I can do to help! My day is not too busy.

kytrinyx commented 5 years ago

@jessieay Yepp, sorry, that's what my jumbled (but victorious) comment was trying to convey :-)

I'm at the point where I think I might have a clue to what's going on. I think we hit this section:

https://github.com/brandur/json_schema/blob/9c4656774a7c7d22a6f466932b34fd47d67c88dc/lib/json_schema/reference_expander.rb#L128-L133

and the schema is already expanded, but we're dereferencing again anyway.

Here's the improved test, btw. It defines the fixtures within the test (so the scaffold doesn't have one-off fixtures for just that test), and also removes any data (title, description) that is unnecessary, and only expands schema 2 (it should recursively expand schema 1, which it is trying to do, but then keeps going around in circles).

it "expands a schema with a reference to an external schema in a oneOf array" do
  sample1 = {
    "$schema" => "http://json-schema.org/draft-04/schema#",
    "id" => "http://json-schema.org/draft-04/schema#",
    "definitions" => {
      "schemaArray" => {
        "type" => "array",
        "minItems" => 1,
        "items" => { "$ref" => "#" }
      }
    }
  }
  schema1 = JsonSchema::Parser.new.parse!(sample1)

  sample2 = {
    "$schema" => "http://json-schema.org/draft-04/hyper-schema#",
    "id" => "http://json-schema.org/draft-04/hyper-schema#",
    "allOf" => [
      {
        "$ref" => "http://json-schema.org/draft-04/schema#"
      }
    ]
  }
  schema2 = JsonSchema::Parser.new.parse!(sample2)

  store = JsonSchema::DocumentStore.new
  expander = JsonSchema::ReferenceExpander.new

  store.add_schema(schema1)
  store.add_schema(schema2)

  expander.expand!(schema2, store: store)

  assert schema1.expanded?
  assert schema2.expanded?
end
kytrinyx commented 5 years ago

I think that should do it :-)

jessieay commented 5 years ago

awesome! thanks for the fix, I really appreciate it.

brandur commented 5 years ago

Amazing debugging work here @kytrinyx / @jessieay! Thanks so much.

Just pushed Katrina's fix in 0.20.5.

jessieay commented 5 years ago

Thanks for the release!

Sadly, when I updated the gem locally to 0.20.5 and ran my original rake task again, it resulted in the same error.

Not sadly (happily!), I am now more familiar with how this library works so I was able to write a test that demonstrates the new edge case that I am experiencing. https://github.com/brandur/json_schema/compare/master...jessieay:jy-more-infinite-loops

# test/json_schema/reference_expander_test.rb

  it "is not broken" do
    sample1 = {
      "id" => "http://json-schema.org/draft-04/schema#",
      "$schema" => "http://json-schema.org/draft-04/schema#",
      "properties" => {
        "additionalItems" => {
          "anyOf" => [ { "$ref" => "#" } ]
        }
      }
    }
    schema1 = JsonSchema::Parser.new.parse!(sample1)
    sample2 = {
      "$schema" => "http://json-schema.org/draft-04/hyper-schema#",
      "id" => "http://json-schema.org/draft-04/hyper-schema#",
      "allOf" => [
        { "$ref" => "http://json-schema.org/draft-04/schema#" }
      ]
    }
    schema2 = JsonSchema::Parser.new.parse!(sample2)

    store = JsonSchema::DocumentStore.new
    expander = JsonSchema::ReferenceExpander.new

    store.add_schema(schema1)
    store.add_schema(schema2)

    expander.expand!(schema2, store: store)

    assert schema1.expanded?
    assert schema2.expanded?
  end

I will stare at it a bit more to see if I can figure out where in dereference this is breaking down, but wanted to let y'all know that this is happening in case the solution jumps out at you immediately when reading the spec.

UPDATE:

just learned via trial and error that if I change the value of ref inside "allOf" to "#" this spec passes. So that specific line must be the issue.

kytrinyx commented 5 years ago

@jessieay Well that's exciting :-) The failing test is a HUGE win, thank you! Would you open a new issue with this new failing test? It's "the same but not the same" so it's worth tracking in a fresh place. I'm speaking at a conference tomorrow, so I can't look at this until that's over.

jessieay commented 5 years ago

@kytrinyx sure thing! done at https://github.com/brandur/json_schema/issues/115