TehShrike / deepmerge

A library for deep (recursive) merging of Javascript objects
MIT License
2.75k stars 216 forks source link

Handling circular references #207

Open nonara opened 3 years ago

nonara commented 3 years ago

Issue

In a circular reference scenario, it goes into an infinite recursion, which blows the stack.

Reproduction

a = { self: null };
a.self = a;
common_1.deepMerge.all([ a, { b: 2 } ])

Fix Suggestion

When recursively walking objects, maintain a chain Map<key: original, value: cloneOrOriginal (depends on settings)> of references to objects & arrays.

Walk function logic:

  1. Create a new chain map from the old (We want to be sure that only the direct lineage exists in this map, which is why create a unique copy for each walked node)
  2. Check if the current object exists in the map.
    • If it does, flag as circular reference
    • Otherwise, add the current reference to the map
  3. Handle circular reference
    • If there is nothing to merge, set output value to the value from our map
    • If there is something to merge and no custom logic exists, throw a circular reference error
  4. If not circular reference
    • Perform normal merge / clone logic & set output value
    • Set output value as value for the current reference in our chain map
  5. Recurse for properties / elements, passing our map
TehShrike commented 3 years ago

This would probably best be solved by a custom isMergeableObject that keeps track of what it's compared to, and returns false if it's an object that has been seen before.

I would be open to linking to such an implementation, or perhaps even putting it into the readme for people to copy/paste.

nonara commented 3 years ago

Updated my original comment with a strategy.

Because of the other issue I posted, this isn't possible for me at the moment. I would suggest directly mitigating this issue in this case. Circular reference issues are bound to happen in many scenarios. This is an issue that we have to deal with in compilers & parsers very often.

In this case, anything with a reference to a node higher up the chain is going to blow the stack. Generally speaking, it's a good idea to ensure that logic is built into recursive functions to ensure that they can never infinitely recurse.

That in mind, proposed strategy above is an approach which covers the issue or throws an appropriate error message if it can't.

If you'd like, I'd be glad to submit a PR.

nonara commented 3 years ago

This would probably best be solved by a custom isMergeableObject that keeps track of what it's compared to, and returns false if it's an object that has been seen before.

Using seen would cause false positives in some cases.

Example:

If we're walking Child2, Child1 and Grandchild1 would have been 'seen', but because they're not in the ancestral lineage, they can be merged.

That's why we need a direct lineage map to be passed to the recursive function. Without a third parameter, an isMergeableObject implementation couldn't do this.

But again, this isn't too difficult to solve and could be taken care of with a few extra lines. Let me know if you'd consider a PR.

TehShrike commented 3 years ago

actually, on reflection, this is a much harder problem that I don't really want to try to solve here – merging two trees makes sense, but merging two cyclical graphs doesn't make sense – what would you even expect the output to be if you tried to merge

const a = { child: { grandchild: { tiny_baby: 'hi } } }

const b = { child: { } }
b.child.grandchild = b

merge(a, b)

That seems nondeterministic, right?

I'm inclined to say "the solution is to eliminate cycles in your objects before trying to merge them"

nonara commented 3 years ago

Edit: I misread the original code... updating my response.

nonara commented 3 years ago

Effectively, that becomes:

const a = { child: { grandchild: { tiny_baby: 'hi' } };
let b;
b = { child: { grandchild: b } };

You're correct that in this case this cannot be merged. Effectively here, if we were to assign a pseudo-AST to it, we have a Node ({ tiny_baby: 'hi' }) and a ParentReferenceNode (b). This would be a non-mergeable scenario. In my proposed method above, I suggest allowing a custom handler function for dealing with circular merge situations, and if one doesn't exist, we use default behaviour.

  1. Handle circular reference
    • If there is nothing to merge, set output value to the value from our map
    • If there is something to merge and no custom logic exists, throw a circular reference error

In your example, it would either pass off the conflict to a handler function, or if none is given, it would throw a circular reference error.

nonara commented 3 years ago

As far as whether it should be solved I have two thoughts:

1. It's generally a good idea to ensure that recursive functions cannot ever hit an infinite recursion scenario.

That in mind, putting in some logic to to track this is good practice. And if you're going to do that, the extra bits to allow handling those scenarios isn't too much extra and would help in making it so only those scenarios that should throw an exception do. (ie, there's a reference to a parent but nothing to merge with it, we can just keep the reference as a reference to its merged counterpart)

2. How often will two graphs be merged?

It depends on the context. This is very common in some areas of CS, and not as much in others. (Any object which contains a reference to a parent will become un-mergeable, and this is fairly common.) Ultimately, it's your call. Not supporting it will limit the scopes in which the library can be used, where adding the ability to handle circular references will certainly broaden that and help a lot of people.

At the very least, it will reduce bug reports when people find themselves hitting circular ref issues.

Again, the rules are pretty simple (pseudo-AST again)

  1. If one property is a ParentReferenceNode, and there is no node to merge

    • Call a handler function if set to determine output
    • (Default if no handler) Output a reference to the corresponding merged parent node
  2. If one property is a ParentReferenceNode and there is another node to merge

    • Call a handler function if set to determine output
    • (Default if no handler) Throw a circular reference exception

The default behaviour can be changed, but this prevents stack being blown and allows users to have control over circular issues, which will allow the library to be used in more contexts.

TehShrike commented 3 years ago

It's generally a good idea to ensure that recursive functions cannot ever hit an infinite recursion scenario

I'm amenable to this. If we added any code to address this, my preferred solution would be to throw an error with a clear message as soon as a cycle was detected.

Merging data structures in JS is complicated enough already, I don't want to make my life the hell that it would become by trying to solve general CS "merging cyclical graphs" problems.

nonara commented 3 years ago

Makes sense. How about this:

  1. Track circular refs
  2. If customMerge is set > Pass a third parameter to the customMerge function called circularReferences.
  3. If customMerge is not set or it returns undefined > The default behaviour throws a circular ref exception, per your preference

circularReferences would be either be undefined if neither a or b are circular, or it will be a simple map if either is. The map would be <sourceReference, targetReference> (see below for example)

Simple, non-breaking, and still allows us hell-dwellers to do our thing, as needed. 😄

Example

The following is how I could implement the behaviour I proposed earlier using this extra param:

let a, b;
a = { child: { parent: a.child } }
b = { child: { hi: 1 } }

merge(a, b, { 
  customMerge: () => (a, b, circularReferences) => {
    if (circularReferences && (a === undefined || b === undefined)) 
      return circularReferences.get(a ?? b); // The value will be a reference to mergedObj.child
  }
});

If you're down with that, let me know, and I'll do the PR.

martin-braun commented 3 years ago

deepmerge should not blow up when merging with circular references. I'd love to see this getting implemented.

TehShrike commented 3 years ago

deepmerge should not blow up when merging with circular references. I'd love to see this getting implemented.

Please read the thread above – I'm open to throwing a clear error when a cycle is detected, but I do not see an obvious way to return sensical data when merging a cycle.

RebeccaStevens commented 3 years ago

I think the default behavior should be throwing an error. If custom merge function is provided then that function can determine how to handle cycles.

nonara commented 3 years ago

I think the default behavior should be throwing an error. If custom merge function is provided then that function can determine how to handle cycles.

Agreed. My proposal above should meet those specifications. I'd still willing to do the PR.

martin-braun commented 3 years ago

@TehShrike I agree that handling circular references is not an easy task, but I think a good starting point is to inspect solutions on stringifying circular references. I think a similar approach could handle circular references here, so it would allow to deep merge objects with circular references without throwing an error.

This implementation would give this package a superior status when it comes to deepmerge-solutions.

I think the default behavior should be throwing an error. If custom merge function is provided then that function can determine how to handle cycles.

I think this is a first good logical step, though.

(Note that I don't want to say that you must implement a proper circular reference handling, I'm just saying it would be super nice to have out of the box)

nonara commented 3 years ago

Unless I'm misunderstanding you, stringifying would not work as it would still have a circular issue.

This sort of thing is not as complicated as it may seem. It's quite common in AST-like scenarios. I apologize if I've not explained the proposed solution clearly.

To clarify, you simply maintain a linear set of parent references during the walk, then pass that set to the custom merge function, if present. The custom merge function can determine what to do if a or b are in the passed set. If no custom merge function is present, or the function returns a value that is still a circular reference, it will throw a circular error.

This allows for fast-exit throw behaviour as the default while also providing for custom handling if the user desires to.

martin-braun commented 3 years ago

Unless I'm misunderstanding you, stringifying would not work as it would still have a circular issue.

@nonara Unfortunately, you misunderstood me. I know that stringify is not a solution to this problem, I was just saying that it has the same problem and it can be worked around. I was just making a point that it's not impossible to fix this, properly.

This sort of thing is not as complicated as it may seem. It's quite common in AST-like scenarios. I apologize if I've not explained the proposed solution clearly.

To clarify, you simply maintain a linear set of parent references during the walk, then pass that set to the custom merge function, if present. The custom merge function can determine what to do if a or b are in the passed set. If no custom merge function is present, or the function returns a value that is still a circular reference, it will throw a circular error.

This allows for fast-exit throw behaviour as the default while also providing for custom handling if the user desires to.

I get your point, I just wish that custom function could be part of the lib in the first place. So I seek a generic approach so to speak.

I'm aware of the main concern now, it seems to be that you might try to merge an object that has nested objects that are part of deeper circular references on the opposite object you want to merge with. Simply skipping as soon as we see circular references would skip the deep merge and the result would be weak.

But what about skipping circular references when we come across them and do the deep merge in both directions, merging a in b and then b in a? I think this would successfully merge and handle circular references well.

Little sample:

const a = {
  a1 = 1,
  a2 = 2,
  a3 = 3
};
a.a = a;
const b = {
  a1 = 1,
  b1 = 1,
  c1 = 1,
  a = {
    a3 = 4
    a5 = 5
  }
}

Of course we have conflicting values here. If I remember correctly the first object will be in favor of conflicts, it should not? if we favor a it should look like this:

{
  "a1": 1,
  "a2": 2,
  "a3": 3,
  "a5": 5,
  "a1": 1,
  "b1": 1,
  "c1": 1,
  "a": {
    "a1": 1,
    "a2": 2,
    "a3": 3,
    "a5": 5,
  }

(here a.a should be a circular reference to the root object)

But if we favor b the resulting object should look like this:

{
  "a1": 1,
  "a2": 2,
  "a3": 4,
  "a5": 5,
  "a1": 1,
  "b1": 1,
  "c1": 1,
  "a": {
    "a1": 1,
    "a2": 2,
    "a3": 4,
    "a5": 5,
  }

(here a.a could be circular or not. If we strictly favor b it could be loosed, I'm not sure about that.)