brandur / json_schema

A JSON Schema V4 and Hyperschema V4 parser and validator.
MIT License
230 stars 45 forks source link

early return for "anyOf" validator #93

Closed breunigs closed 6 years ago

breunigs commented 6 years ago

There is no need to go through all candidates as soon as we find the first suitable one for the anyOf validator. Using a similar schema as in #92 and with the patches from #92 I get the following times:

0:55.24 0:56.67 0:57.17 # baseline
0:57.92 0:56.92 0:56.93 # with patch, but valid "anyOf" item is last
0:18.62 0:18.55 0:18.31 # with patch, but valid "anyOf" item is first (skips 5 invalid validations)

The patch is obviously circumstantial, but it doesn't incur any functional changes.

brandur commented 6 years ago

Oof, so actually this non-optimized code is there for a reason. Once you have a sufficiently complex schema, debugging a failure on an anyOf/allOf/oneOf can become quite frustrating because although you know that a schema didn't match, you don't know why or which.

Populating suberrors here is a technique that's used for additional introspection on failures. After say your anyOf fails, you can then go through each subschema and understand why it didn't match by looking at its own failures. I was saved by this feature even as late as last week, where we had an OpenAPI 3 spec validation failure involving an anyOf that would have been next to impossible to debug any other way.

brandur commented 6 years ago

We could possibly add a configuration option for this (which is off by default) where speed is preferable over debugging visibility.

breunigs commented 6 years ago

I understand your general idea, and a “fail fast” switch would certainly lead to further speed improvements. I was a bit wary of doing that because it has the potential of “two code paths”, each with different bugs. I feel that the extra maintainance effort would not be worth the speedup.

Regarding this patch, I assume you mean it would reduce debugging comfort because a pry placed somewhere in the code might not trigger? If you’re talking about the end user (or gem user) then I must be missing something - To my understanding the errors are discarded and never printed if it finds at least one valid entry. Did I miss something in this regard...?

brandur commented 6 years ago

I understand your general idea, and a “fail fast” switch would certainly lead to further speed improvements. I was a bit wary of doing that because it has the potential of “two code paths”, each with different bugs. I feel that the extra maintainance effort would not be worth the speedup.

Yeah, that's certainly worth considering.

Regarding this patch, I assume you mean it would reduce debugging comfort because a pry placed somewhere in the code might not trigger? If you’re talking about the end user (or gem user) then I must be missing something - To my understanding the errors are discarded and never printed if it finds at least one valid entry. Did I miss something in this regard...?

My apologies, I think I misread this on the first run through. I went over it again, and I think you're right. As implemented, it should still produce a good set of errors for an any_of that couldn't successfully validate, but will just be faster for one that could.

I'm going to pull this in. Thanks!

brandur commented 6 years ago

Released as 0.17.2.