use flap_map for a small performance gain

brandur / json_schema

A JSON Schema V4 and Hyperschema V4 parser and validator.

MIT License

230 stars 45 forks source link

use flap_map for a small performance gain #91

Closed breunigs closed 6 years ago

breunigs commented 6 years ago

On a not-too-complex schema, validating a 15 MB JSON file, the flat_map saves roughly 3% processing time.

Total time (seconds) for 40 validation runs of the same file. Loading the schema and parsing the file were not part of the test run:

      user     system      total        real
475.880000   0.070000 475.950000 (476.523519)  current
461.270000   0.060000 461.330000 (461.930364)  flat_map

breunigs commented 6 years ago

Ah, tested only on Ruby 2.4.

brandur commented 6 years ago

Thanks @breunigs! Even without a performance gain, this still seems like a cleaner/better way to implement this.

brandur commented 6 years ago

(I suspect that optimizing this library will be a very deep rabbit hole BTW. It's really not written with speed in mind at all.)

brandur commented 6 years ago

Released as 0.17.1.

breunigs commented 6 years ago

Don't make yourself so small. For the same workload json_schema takes ~11seconds, whereas json-schema takes ~3m05s. I'm also contend once it's "fast enough for our task", so I won't overdo it. The bigger issue is that due to the recursive approach flamegraphs don't show any obvious culprits to investigate. If time allows, I'll dig a bit deeper, but no promises.

brandur commented 6 years ago

For the same workload json_schema takes ~11seconds, whereas json-schema takes ~3m05s. I'm also contend once it's "fast enough for our task", so I won't overdo it.

That's pretty cool. I'm honestly quite surprised that json_schema is faster, let along the stark difference! Thanks

The bigger issue is that due to the recursive approach flamegraphs don't show any obvious culprits to investigate. If time allows, I'll dig a bit deeper, but no promises.

I've always suspected that the reference expansion code could be made much more efficient, but the good news is that only needs to be run once on the initial schema expansion and things should be quick from there.

Sounds good though. Let me know if you end up making any interesting improvements.