avian2 / jsonmerge

Merge a series of JSON documents.
MIT License
214 stars 25 forks source link

jsonschema 2.5.0 breaks Merger.get_schema() #20

Closed avian2 closed 7 years ago

avian2 commented 8 years ago

jsonschema 2.5.0 introduced a LRU cache in the reference resolver:

https://github.com/Julian/jsonschema/pull/203

This breaks get_schema() in some subtle ways, since we're modifying the schema while we're walking through it and dereferencing things. RefResolver.resolving() now sometimes returns outdated parts of the structure, which causes problems.

A proper solution would be to build up a new schema in parallel while walking over the old one (which should be read-only), similar to how we do it for walking over instances. This is not a trivial change. Specifically, it's hard to properly retain cross-references in the new schema.

Related comments in the code (several parts regarding handling of references currently work more-or-less by luck):

https://github.com/avian2/jsonmerge/blob/master/jsonmerge/strategies.py#L186 https://github.com/avian2/jsonmerge/blob/master/tests/test_jsonmerge.py#L1248

RayPlante commented 7 years ago

Hey Tomaz (@avian2). I was wondering if you could say something more about "returns outdated parts of the structure", perhaps by giving an example if its not too complicated.

Upgrading to use jsonmerge 1.4.0 broke my application with the introduction of the LocalRefResolver class. I was using Merger.cache_schema() to preload schemas I would need to avoid downloads; however, it seems the use of is_remote_ref() in 1.4.0 prevents the option of resolving a schema from cache. Consequently, my $refs are not getting resolved.

1.3 still works fine, so I'm still using that, but I'm trying to think about the best way to get around this with 1.4. Are able to say more about what was going wrong with RefResolver? Any other enlightenment would be welcome as well. thanks!

avian2 commented 7 years ago

My memory is getting a bit fuzzy on this issue. I think the problem was that once you dereferenced a $ref, the result got cached. The issue was that jsomerge descended into a $ref during a get_schema() call and changed something there. When it dereferenced the same $ref again, it got the old, unchanged part of the schema back, not the new, modified one.

The simplest way to see what broke is to install the latest jsonschema 2.5.0 and jsonmerge 1.3.0 and run python setup.py test from jsonmerge. You should see some test failures

I chose not to dereference external references at all in jsonschema 1.4.0. This avoided all sorts of corner cases that were not well supported previously. Can you make a minimal example of what doesn't work for you?

RayPlante commented 7 years ago

Hey, thanks for the response--this will get me started. To tell the truth, I'm not exactly sure what's not working, yet; I just know that a couple of my unit tests were failing.

BTW, you might be interested to know that I'm using jsonmerge in my project, oar-metadata, which is part of a project to build a public data repository at the US National Institute of Standards and Technology. I need a way to merge JSON metadata provided by the user with that generated automatically from other sources. jsonmerge allows me to control that merger on a fine-grained level.