cujojs / jiff

JSON Patch and diff based on rfc6902
Other
627 stars 41 forks source link

Large diffs #26

Closed patrickliechty closed 9 years ago

patrickliechty commented 9 years ago

I am changing one property in a larger set of JSON and the diff is just as large as the original JSON. Here is the original JSON:

{"images":[{"id":"bdb71c44-f980-499b-abf5-5aed88a04625","entries":[{"templateProperties":{"content":"Other","contentURI":"https://beta.familysearch.org/indexing-service/template/templates/record?name=record.template.431","reviewState":"agree"},"fields":[{"label":"RELATIONSHIP_2","content":" Brother-in-Law","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_GN","content":"BOB","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_SURN","content":"JONESS","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_TITLE_TERMS","content":"","contentURI":"","reviewState":"agree","previousContent":""}]},{"templateProperties":{"content":"","contentURI":""},"fields":[]}],"header":{"fields":[{"label":"Image Type","content":"","previousContent":"","reviewState":"agree"},{"label":"Duplicate Image","content":"","previousContent":"","reviewState":"agree"},{"label":"2ndheader","content":"asdf","previousContent":"","reviewState":"agree"}]}},{"id":"80a2b121-87b6-46d6-8ea6-c5fe971943f3","entries":[{"templateProperties":{"content":"","contentURI":""},"fields":[]}],"header":{"fields":[{"label":"Image Type","content":"","previousContent":"","reviewState":"agree"},{"label":"Duplicate Image","content":"","previousContent":"","reviewState":"agree"},{"label":"2ndheader","content":"","previousContent":"","reviewState":"agree"}]}}],"userId":"28e8e455-e070-4fa1-b6f7-35098f5ae18d"}

Updated JSON:

{"images":[{"id":"bdb71c44-f980-499b-abf5-5aed88a04625","entries":[{"templateProperties":{"content":"Other","contentURI":"https://beta.familysearch.org/indexing-service/template/templates/record?name=record.template.431","reviewState":"agree"},"fields":[{"label":"RELATIONSHIP_2","content":" Brother-in-Law","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_GN","content":"BOB","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_SURN","content":"JONESS","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_TITLE_TERMS","content":"","contentURI":"","reviewState":"agree","previousContent":""}]},{"templateProperties":{"content":"","contentURI":""},"fields":[]}],"header":{"fields":[{"label":"Image Type","content":"","previousContent":"","reviewState":"agree"},{"label":"Duplicate Image","content":"","previousContent":"","reviewState":"agree"},{"label":"2ndheader","content":"asdf","previousContent":"","reviewState":"agree"}]}},{"id":"80a2b121-87b6-46d6-8ea6-c5fe971943f3","entries":[{"templateProperties":{"content":"","contentURI":""},"fields":[]}],"header":{"fields":[{"label":"Image Type","content":"","previousContent":"","reviewState":"agree"},{"label":"Duplicate Image","content":"","previousContent":"","reviewState":"agree"},{"label":"2ndheader","content":"","previousContent":"","reviewState":"agree"}]}}],"userId":"28e8e455-e070-4fa1-b6f7-35098f5ae18d"}

The DIFF:

[{"op":"add","path":"/images/0","value":{"id":"bdb71c44-f980-499b-abf5-5aed88a04625","entries":[{"templateProperties":{"content":"Other","contentURI":"https://beta.familysearch.org/indexing-service/template/templates/record?name=record.template.431","reviewState":"agree"},"fields":[{"label":"RELATIONSHIP_2","content":" Brother-in-Law","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_GN","content":"BOB","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_SURN","content":"JONES","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_TITLE_TERMS","content":"","contentURI":"","reviewState":"agree","previousContent":""}]},{"templateProperties":{"content":"","contentURI":""},"fields":[]}],"header":{"fields":[{"label":"Image Type","content":"","previousContent":"","reviewState":"agree"},{"label":"Duplicate Image","content":"","previousContent":"","reviewState":"agree"},{"label":"2ndheader","content":"asdf","previousContent":"","reviewState":"agree"}]}}},{"op":"test","path":"/images/1","value":{"id":"bdb71c44-f980-499b-abf5-5aed88a04625","entries":[{"templateProperties":{"content":"Other","contentURI":"https://beta.familysearch.org/indexing-service/template/templates/record?name=record.template.431","reviewState":"agree"},"fields":[{"label":"RELATIONSHIP_2","content":" Brother-in-Law","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_GN","content":"BOB","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_SURN","content":"JONESS","contentURI":"","reviewState":"agree","previousContent":""},{"label":"RELATIVE_TITLE_TERMS","content":"","contentURI":"","reviewState":"agree","previousContent":""}]},{"templateProperties":{"content":"","contentURI":""},"fields":[]}],"header":{"fields":[{"label":"Image Type","content":"","previousContent":"","reviewState":"agree"},{"label":"Duplicate Image","content":"","previousContent":"","reviewState":"agree"},{"label":"2ndheader","content":"asdf","previousContent":"","reviewState":"agree"}]}}},{"op":"remove","path":"/images/1"}]

patrickliechty commented 9 years ago

Only one property with value JONESS was changed, yet the whole object was marked for removal and replacement.

patrickliechty commented 9 years ago

I would expect a removal of just the one property and the add for the new property. Why is it not this granular? Is this specified in the spec? Or has the Jiff library not implemented this level of granularity?

briancavalier commented 9 years ago

@wwwpol Diffing arrays of objects is tricky for many reasons. One of those is that the objects can move around in the array without being changed, or they could remain in place and have their properties modified, etc. So, the diff algorithm needs a way to recognize objects in arrays that are the "same" but which may have changed properties or may have moved around.

For flexibility, jiff doesn't assume it can use a property named id to do that, rather it allows you to provide a function. When dealing with arrays, you can provide a hash function that identifies "same" items. Then, you should see a diff with only the small change you expected.

The docs for how to specify the hash function are in the README. Please give that a try and let me know how it goes.

patrickliechty commented 9 years ago

The Java version of Json Patch is doing this and creating small diffs without any developer intervention. Do you plan on following what they are doing?

patrickliechty commented 9 years ago

Here is the link to their code:

https://github.com/fge/json-patch/tree/master/src/main/java/com/github/fge/jsonpatch/diff

patrickliechty commented 9 years ago

Back to your code. In the hash function, how do I compare to see if it matches? There is only one parameter passed to the hash function. It appears you are walking the object structure. I would expect an a and b params that I can use to compare. How should I compare?

patrickliechty commented 9 years ago

Another related question. Why is the diff generating a "test" operation?

patrickliechty commented 9 years ago

I see that the "test" operation must be generated to allow it to be inverted. Can I turn that off?

briancavalier commented 9 years ago

Do you plan on following what they are doing?

The Java version relies on hasCode which is effectively a primitive part of the Java language. JavaScript has no such primitive facility for object identity. We could, for example, impose a restriction that any object you run through jiff must have a hashCode (or similar) method. Unfortunately, that would be impractical since it wouldn't survive a typical JSON.stringify/parse, and jiff is intended to handle plain objects and json data.

In the hash function, how do I compare to see if it matches?

The hash function is not a compare or equals function. It should return a string or number that uniquely identifies the thing that was passed to it, much like Java's hashCode. Jiff will use this value internally to do identity comparisons.

I see that the "test" operation must be generated to allow it to be inverted. Can I turn that off?

There's no way to turn it off in the current version. Initially, we decided to err on the side of safety and allowing inversion.

I'm totally open to adding an option to disable it, though.

patrickliechty commented 9 years ago

Thanks for the responses. Interestingly enough, if I just return false from the hash function, I get the small diff I wanted.

I would like the option to remove the test. We don't have plans to invert what we have done. Our main goal is to produce smaller http requests when a user updates their data.

patrickliechty commented 9 years ago

I found another issue. I did a diff and got this result:

[{"op":"test","path":"/images/0/entries/0/fields/2/content","value":"Liechty"},{"op":"replace","path":"/images/0/entries/0/fields/2/content","value":"Jones"}]

Jones was the old value and Liechty is the new value. It is backwards. You should be doing the replace operation with the new value.

patrickliechty commented 9 years ago

I added the jshash library to generate a unique hash in the hash function. It seems to work well.

patrickliechty commented 9 years ago

I found the jshash here: http://pajhome.org.uk/crypt/md5/

briancavalier commented 9 years ago

Thanks for the updates @wwwpol. I'll try to hit a few of your recent points.

if I just return false from the hash function, I get the small diff I wanted.

This may seem to be doing the right thing, but if you're diffing an array of objects it's likely just working by luck. A hash function that returns the same value for everything means "every object in the array is the same object", which probably isn't what you intended. More info on hash functions below ...

I would like the option to remove the test. We don't have plans to invert what we have done

Sure thing. Seems like a good option. I opened another issue for that, see #27. I may be able to do it this weekend.

I found another issue

Jiff has quite an extensive test suite, including the official test suite provided by the JSON Patch authors. We've not seen the issue you described, but if you can provide a minimal test case that reproduces the problem, I'll be happy to dig into it.

I added the jshash library to generate a unique hash in the hash function

Cool, looks like an interesting library. Thanks for the link.

The key to jiff's hash function is ensuring that it returns the same value for objects, which in your application domain, should be considered as the same object, even if some of their properties differ. So, as a simple example, if all your objects have an integer id field, a simple and very reliable hash function would be: function(x) { return x.id };.

An example of a (likely) bad hash function for that scenario (objects with an id) is JSON.stringify, since it would return something different for two objects whose id fields match, but whose other properties differ.

briancavalier commented 9 years ago

@wwwpol See #28. It adds an option to turn off the extra test ops.

patrickliechty commented 9 years ago

Excellent, thanks

patrickliechty commented 9 years ago

I was passing the parameters in the wrong order. That is why the values were backwards. My bad.

patrickliechty commented 9 years ago

Thank you for your efforts on this library. It is appreciated!

briancavalier commented 9 years ago

Thanks, @wwwpol, and I appreciate your feedback.

I just merged #28, and I'll get a release out tonight. I feel like that, plus the hash function, addresses the bulk of concern about large diffs for now. Can we close this issue, or are there other things related to patch size you'd like to discuss?

patrickliechty commented 9 years ago

When I get the release tonight, this will resolve this issue. Thanks again.

briancavalier commented 9 years ago

@wwwpol 0.7.1 just landed :)

patrickliechty commented 9 years ago

Awesome, thanks.

briancavalier commented 9 years ago

Closing this, but please feel free to discuss more, or reopen if you need.