Closed diachedelic closed 6 years ago
First off, glad to hear rfc6902
is smarter than the competition š (at least on your data)
I understand your problem, and agree that the patches you speculate are nicer and semantically saner, but I'm worried that a single maximum depth parameter is simplistic, and not the right solution.
For example: (if I recall correctly) GeoJSON can have nested structures, where shapes contain shapes which contains shapes, etc. What you might want in that case is for a "coordinate" tuple to be treated as an atomic entity, no matter how deep it occurs. If rfc6902 can handle that somehow (see solution 2. below), that seems preferable.
I see two potential solutions with a broader vision:
diffArrays
and diffObjects
(and aggregate rootward/higher up the tree structure) to prefer fewer replace
operations despite a larger value
payload.
diffMyTypes
functions that short-circuit the usual recursion on certain conditions. In your example, you could check the that last token in the provided Pointer
is "coordinates"
and intercept the default diffArrays
handling, always returning a replace
operation if the two given children of coordinates are not equal.
I like the idea of providing a function to check the recursion, I have already run into a situation where the depth
param truncated a part of my patch I hadn't anticipated. I suppose the diffMyTypes
function should just take a single param, ptr
?
Hmm, I haven't thought it through all the way; my first sense was that you'd pass it the input
and output
objects, too, just like all the other diff*
functions. If you just gave it a single argument, ptr: Pointer
, then I guess it would have to return another function, which in turn returned Operation[]
like the existing diff*
functions? But then what's the point of the indirection? I might be convinced to use such an abstraction, but don't immediately see the benefit...
I don't understand why this function would be returning Operation[]
- would it function as some sort of replacement/intercept for diffAny()
? I do understand it would be nice to return something more interesting than a boolean.
Maybe i get it...if I wanted to restrict the depth of my diff, I would write something like this:
import {diffValues, diffAny} from 'rfc6902/diff'
createPatch(input, output, function(input, output, ptr) {
if (ptr.tokens.length >= 4) {
return diffValues(input, output, ptr)
}
return diffAny(input, output, ptr)
})
Yeah, exactly! Or maybe createPatch
would defer to diffAny
if the given custom diff*
parameter returned anything besides an Array
(e.g. undefined
), so that you (the user) wouldn't have to worry about importing and calling it (and you'd have to pass in the custom function, which would be annoying if you were using =>
function syntax)
As another example, here's how you might stop without recursing deeply into "coordinates":
import {diffValues} from 'rfc6902/diff'
createPatch(input, output, (input, output, ptr) => {
if (ptr.tokens[ptr.tokens.length - 1] == 'coordinates') {
return diffValues(input, output, ptr)
}
})
Looks good to me - what would this function be called? diffMyType
?
Good question. ~diffOverride
~? ~diffDispatch
~? diffCustom
?
Since it's a function argument, the API is purely positional, so the name isn't terribly important ā it can change later without breaking the API.
Do you want to take a crack at it and update PR #30? This solution should look quite similar, but with a little extra logic needed in createPatch
. The default value should be a function that returns undefined
. And as I remarked at https://github.com/chbrown/rfc6902/pull/30#issuecomment-397038544:
Also, this new parameter should have defaults for all exported functions, to maintain API compatibility.
How about diffNode
? Seeing as it is used to diff every single node in the object tree, and its default value is a noop, which is still executed on every node (to no effect).
Hrrmm. By that reasoning, which is sound, that's what I should have named diffAny
. For parity, maybe preDiffAny
? tryDiffAny
? optionalDiffAny
?
I also have a weak preference to avoid the name "node" in my Node.js projects, to make full text search a little less ambiguous.
tryDiffAny
makes the most sense internally, but the least sense for a casual user looking at createPatch
.
Also, internally, tryDiffAny
doesn't makes sense, as by default it is diffAny
Yeah I didn't consider using diffAny
as the default for this new parameter, nor wrapping it in a closure in createPatch
(which seems quite elegant!), and I'm trying to think if there'd be any conflicts / loopholes / edge-cases in not knowing whether diffNode
is a custom diff function or the built-in diffAny
(e.g. infinite loops, or failing to percolate the custom diff function throughout the tree traversal).
Now that I've seen your implementation, I'm of the opinion that simply diff
might be the most appropriate name for this new argument.
@chbrown I've updated my pull request, please let me know if there's anything else you'd like changing (or tests added?)
Ok, I merged it with some changes: primarily naming , and dropping 176c48c ā which I'll handle separately. Thanks!
Cheers, thanks for the quick publish
I am creating patches to represent changes made to ~100KB GeoJSON objects. I tried every single module I could find which generates rfc6902-compliant patches, but only
chbrown/rfc6902
gave sane results for my use case (due to the recursion indiffAny()
). However, I need a way to limit the depth of the patches.For example, suppose I had a GeoJSON LineString, which stores the coordinates of each point (lat/lng) in its own array:
Now I modify it by changing a few coordinates:
I run
createPatch
, and I get this:At this small scale, there is no problem. But when these LineStrings contain hundreds of coordinates, each coordinate must be diffed against each other, which is expensive, and also creates large patches (as there are many operations and each has a long pointer). In a real world benchmark, it took over a second to create a patch, and 6 seconds to apply!
I do however think I have figured out a solution, which involves adding a
depth
parameter tocreatePatch()
, preventing excessively detailed patches. If I set this hypothetical depth parameter, I get this patch instead:Not only is this patch more succinct, but it better reflects the reality of how I'm altering LineStrings - I am not modifying individual points as such, I'm changing a line in its entirety. In a real world scenario, this more than halves the size of the patch and computes near-instantly.
I am planning to implement this on my fork (it's not a very big change), and create a test case.