chbrown / rfc6902

Complete implementation of RFC6902 in TypeScript
https://chbrown.github.io/rfc6902/
322 stars 39 forks source link

Creating patches from Date-objects #15

Closed larrifax closed 2 years ago

larrifax commented 7 years ago

I'm trying to call createPatch() on dates (or objects containing dates), but the diffing algorithm doesn't seem to detect differences between Date-objects.

import { createPatch } from "rfc6902";

const original = new Date();
const modified = new Date();
modified.setHours(modified.getHours() + 1);

const patch = createPatch(original, modified) // returns []

I guess the problem lies here, as typeof new Date() === "object". Would it be possible to add a check for Date-objects and route them to the diffValues()-method?

chbrown commented 7 years ago

Right, createPatch expects POJO arguments, and doesn't gracefully handle fancy objects that are more than the sum of their parts.

As far as rfc6902 is concerned, Date is a special case of the more general category of things that have typeof x == 'object' but cannot be iterated and rebuilt by key.

Maybe Date is a common enough type to merit special handling, but I'd prefer a more general solution that sends all fancy objects directly to diffValues(). I could do this with a brute force input = JSON.parse(JSON.stringify(input)) but that would incur a lot of (usually) unnecessary processing overhead.

I'm thinking something like

const isPOJO = Object.getPrototypeOf(input) == Object.prototype

or

const isPOJO = input.constructor == Object

which appears to work for Date vs. literal objects, but I would need to do more testing on the generality / backward compatibility of this approach.

larrifax commented 7 years ago

Thanks for looking into this. I agree that converting the value to & from JSON incurs too much extra processing overhead in the normal case. However, the two other examples you give would prevent key-diffing of ES6 classes (and probably ES6 classes transpiled to ES5), which we don't always want (?)

class Test {}
const isPOJO = Object.getPrototypeOf(new Test()) == Object.prototype // false

I imagine the exceptions to the typeof x == 'object'-rule could differ from project to project. Could some kind of "fancy object" whitelist be an option?

chbrown commented 7 years ago

I think the option should be "best effort with user classes" which is what it currently naively does, since it assumes the user isn't going to send in any instances of custom classes, vs. "consider all user classes completely opaque." The latter is more sane, because the following makes sense but is probably not what the user intended:

const {createPatch, applyPatch} = require('../rfc6902');

class Dog {
  constructor(breed) { this.breed = breed; }
  bark() { console.log('woof!'); }
}

class Car {
  constructor(make, model) { this.make = make; this.model = model; }
  honk() { console.log('beeeep!'); }
}

var dog = new Dog('dachshund');
var car = new Car('bmw', 'm3');

var dog2car = createPatch(dog, car);
console.log('dog -> car:', dog2car);

var cardog = dog;
applyPatch(cardog, dog2car);
console.log('cardog:', cardog);
// cardog Dog { make: 'bmw', model: 'm3' }

cardog.bark();
// woof!
cardog.honk();
// TypeError: cardog.honk is not a function

I don't want this library to get into the business of extending RFC6902 to support any enhanced (non-JSON) JavaScript types, but it should be easier for the user to drop in their own diffObjects (or diffArrays, etc.) functions without having to fork or otherwise rewrite large portions of the library.

So my plan is to:

  1. Add an argument to createPatch that percolates to diffAny (and probably all the diff* functions that are recursive) that, when set, specifies the current key-diffing behavior on user class instances, but by default considers them opaque, so that you wouldn't have hit your original surprising behavior on Date.
  2. Think about a good way to construct a createPatch creator function so that individual diff functions can be rewritten easily based on users' project requirements.
larrifax commented 7 years ago

Your plan sounds good. I like the way the library currently handles custom classes, but I do agree that it would be safer to consider them opaque. I'm happy as long as I'm able to hook into the createPatch-function in order to enable key-diffing on custom classes while simultaneously keeping the opaque diff on e.g. Date objects.

bsuchorowski commented 7 years ago

Hi, any progress with that issue? I came across same problem yesterday and I'm looking for any solution that wouldn't require rewritting much of my code.

jlchereau commented 6 years ago

JSON.parse solves such specificity with a reviver function parameter which is often used to convert JSON dates into Date objects. You need such a hook for any specific key diffing.

fredgate commented 4 years ago

This issue has been open for more than 3 years. It would be nice to solve it quickly as date usage is very frequent. Date object is a native javascript object, that could be treated specifically as it is so often used. No need to complicate solving of this issue. It will be time later to have a generic solution for all particular uses; but date does not deserve such a complication.

chbrown commented 4 years ago

I'm almost sympathetic to adding a special case for Date instances... except that dates don't survive a roundtrip though JSON :(

var now = new Date()
now == now
//⇒ true
JSON.parse(JSON.stringify(now)) == now
//⇒ false

A special case at the top of diffObjects wouldn't feel like too much of a hack, but emitting operations with a value that doesn't survive JSON-serialization still feels icky, even 3 years later.

I've just added some documentation to the README about customizing the diff function via createPatch's optional diff argument (new as of v2.4.0, released on 2018-06-25, partly with this issue in mind).

To give a concrete example for Date instances specifically:

const customDiff: VoidableDiff = (input: any, output: any, ptr: Pointer) => {
  if (input instanceof Date && output instanceof Date && input.valueOf() != output.valueOf()) {
    return [{op: 'replace', path: ptr.toString(), value: output}]
  }
}
const input = {date: new Date(0)}
const output = {date: new Date(1)}
createPatch(input, output, customDiff)
//⇒ [ { op: 'replace', path: '/date', value: new Date(1) } ]  // as expected 👍

That's not the solution this thread's commentators are asking for :/, but I'm not convinced that adopting non-JSONizable values into this library's default behavior (i.e., without requiring the user to do a bit more work to bring them in) is worth the potential headaches.

I'd love to hear any feedback on whether I'm being paranoid re: headaches, etc.

I'm also leaving this open since it is clear to me that compare(new Date(0), new Date(1)) returning true isn't ideal. I'll have to think about what to do in that case... maybe add an unsafe argument to createPatch that defaults to false and throw an exception on encountering "objects that are more than the sum of their [properties]" (somehow!?) unless unsafe==true.

Blackbaud-ChristopherHlusak commented 4 years ago

Obviously this is a challenging issue to solve cleanly, and being able to supply a custom diff on the createPath method (as you show above) solves root-level date properties, but it doesn't accommodate the case where perhaps you might have an array of objects on the root object that have a date property.

RootObject --> id, other root properties (name, description, whatever) --> Collection of ChildObject ------> ChildObject (having string, string and date properties)

As you articulate, the compare() is where the two different dates are seen as equal. So what we're seeing is that when one just wants to change a date value on an item in a root object's array it's not seen as a change. compare() is invoked for each of the child object properties and it doesn't pick up the date change.

I'm trying to think about the best way to work around this concern, so if anyone has come across this concern and has a nice workaround I am very curious to hear it.

Editing to add - curious if you had considered allowing createPatch to supply an override to compare() rather than supplying a voidableDiff? I'm probably missing something.

chbrown commented 4 years ago

@Blackbaud-ChristopherHlusak you're totally right 🤦‍♂ and even more succinctly:

createPatch([new Date(0)], [new Date(1)])
[]  /* <- wrong! 😞 */

I gloat about my fancy Levenshtein array diffing in the README, but that's how I've shot myself in the foot here, because compare is naive to the relevant customization implemented in the diff function passed to createPatch, and thus behaves incorrectly. If diffArrays didn't try to be so smart and was more like diffObjects — only doing brute-force pairwise comparisons — we wouldn't be having this problem.

I'll have to think about this a bit before I decide on a solution, but my intuition is that the compare* family is the problem, and that it should delegate all logic to the diff* family. And if after doing that there is still any need for a general purpose compare(a: any, b: any) => boolean function, it could be implemented as diff(a, b).length === 0 (i.e., the fully customizable diff returns no patches). My main concern is that this might kill diffArray's performance, or end up doing a bunch of duplicate work, but I agree that the current behavior is incorrect and I'll figure out how to fix it.

Blackbaud-ChristopherHlusak commented 4 years ago

It's definitely been a crazy two months since our last exchange, but I wanted to ask if you have had an opportunity to look into this issue any further? It's conceivable we might dig into it ourselves a little bit, too - I'm sure it seems like such a small thing at this time.

Nice library though, it's generally worked quite well outside of this small issue!

chbrown commented 4 years ago

Hi @Blackbaud-ChristopherHlusak unfortunately I got sidetracked 😬 but thanks for your reminder! Revisiting this yesterday/today I implemented the solution sketched out in my previous comment, which at least fixes the breaking test added in 88fae08 (and it would be trivial to add a conditional branch like if (object instanceof Date) { return 'date' } to diff.objectType which would handle the case in https://github.com/chbrown/rfc6902/issues/15#issuecomment-593519853), but the concern I mentioned remains.

I'm not sure how detrimental it would be IRL, but I can imagine pathological cases where having some large objects/arrays nested in other objects/arrays could dramatically tank performance compared to current (master branch) behavior. Ideally, diffArrays and diffObjects would be iterators, and the new !diff(a, b).length could short-circuit on the first result, but this could be tricky to implement inside the Levenshtein logic, not to mention the iffy cross-platform compatibility of JavaScript generator functions.

For that reason, I'd probably consider the potential performance regression caused by the replace-compare-with-diff branch a breaking change, and bump the major version if I released it. Same if I resolved the performance regression issue by incorporating something relatively fancy like iterators / generators.

I'm open to suggestions, but at the moment my inclination is to meet somewhere in the middle and move the equal.{compareArrays,compareObjects} functions parallel to each of the diff.{diffArrays,diffObjects} functions and pass in the custom diff argument just like with the diff* functions.

Blackbaud-ChristopherHlusak commented 4 years ago

Sorry for the delay, I just noticed your reply - I can carve out some time in my next sprint or the one afterwards to test & examine - you obviously have a more solid handle on this than anyone, but I'm definitely intent on helping examine & verify your solution.

For what it's worth, the object sizes we've been dealing with I can't imagine any sort of a performance concern whatsoever... I haven't specifically tested how long it takes to calculate patch operations for our objects but I always just dismissed it as an in-memory sort of calculation that would be negligible (your concern suggests maybe I should pay more attention to the speed).

I'm adding a reminder for myself to follow-up next sprint and I truly appreciate you taking the time to look into the issue.

Blackbaud-ChristopherHlusak commented 4 years ago

Hello! I am attempting to npm install that branch and it looks like I'm getting an error w/ util.ts that is preventing the installation. This file hasn't changed w/ that branch; I have not previously done an npm install to a specific branch so it's possible I'm not doing it correctly.

npm ERR! prepareGitDep util.ts(20,27): error TS2339: Property 'length' does not exist on type 'never'.

I'm looking further to understand, but thought I would mention it along with just mentioning I'm looking to verify that the changes work across the various patch use cases we have.

I'm attempting the following: npm install --save https://github.com/chbrown/rfc6902#replace-compare-with-diff --verbose

Blackbaud-ChristopherHlusak commented 4 years ago

I made a fork and just worked around the typescript compilation issue (which wouldn't be a problem as you normally release), leveraged that fork, and it looks like your fix works for all of the current scenarios we use in the application. For what it's worth, this looks good to me so far.

chbrown commented 4 years ago

Finally (!) getting back to this and hopefully (😅 ) finishing it out...

  1. Diving back into that replace-compare-with-diff branch, I split out the non-potentially-regressive (= uncontroversial) changes and pushed those to master: https://github.com/chbrown/rfc6902/compare/208d10f...v3.1.0
  2. The fresh Travis CI build replicated the "TS2339" compilation error mentioned by @Blackbaud-ChristopherHlusak, which was due to the typescript dependency not being pinned and upgrading from 3.8.* to 3.9.* — not a big deal; fixed in c200e45 by a simple type assertion.
  3. At this point, all that remained was to incorporate the distilled remnants of replace-compare-with-diff as rebased onto v3.1.1, which clearly depicted the trade-off between the the simple solution of replace-compare-with-diff and the potential performance regression described in https://github.com/chbrown/rfc6902/issues/15#issuecomment-626090712. Observing that compare was only used in two places: (1) inside diff.diffArrays (to test whether a pair of elements in the array are equal) and (2) inside patch.test, and facing the prospect of basically reimplementing the compare{,Arrays,Objects} family as lazy_diff{Any,Arrays,Objects} (or w/e naming scheme) companions to the existing diff{Any,Arrays,Objects} family, I decided it wasn't the worth retaining the additional complexity, especially since it'd be nearly identical code. But the performance concern remains, and I deleted the entire equal module, which IMO constitutes a major version bump.

tl;dr: I chopped up the changes FKA the replace-compare-with-diff branch into some commits leading up to v3.1.0 (released—after a few other tiny fixes—as v3.1.1), and a culminating commit fixing the issue, but carrying the potential performance-regression-inducing change, resulting in v4.0.0 (released just now).

Blackbaud-ChristopherHlusak commented 4 years ago

Appreciate the effort, we'll incorporate the latest into our solution during our next sprint. Thanks much.

chbrown commented 2 years ago

1+ year out and no complaints so I assume this is resolved.