dustingetz / react-cursor

Immutable state for React.js
1.03k stars 50 forks source link

Memory leaks due to heavy memoization #17

Open dustingetz opened 10 years ago

dustingetz commented 10 years ago

Requires thought to fix.

Specifically, every cursor instance ever created for each (react component, path, value) is forever retained so if we see the same (react component, path, value) we can return the same reference.

andrewluetgers commented 9 years ago

Just found out about this project and curious about this. Memory issues were my immediate concern when thinking about how this works and noticed this issue. Would there be a way to solve this using something like immutable.js?

dustingetz commented 9 years ago

The datastructures are not the thing that is leaking, here. The leak is due to the memoizing cache policy.

lijunle commented 9 years ago

It could be great to use ES6 WeakMap for cache to avoid memory leak.

There are some polyfill libraries, for example this one: https://github.com/Benvie/WeakMap

lijunle commented 9 years ago

Sorry that I misunderstand the WeakMap ES6 feature (I suppose it is WeakReference feature.)

But, I find that WeakMap feature can be used to resolve your memory leak issue too. :smile:


About the cache key

First of all, the cmp variables inside all Cursor instances are the same. The path variable is used to navigate to a sub-object of global single state (the value variable in code). So, the only necessary variable provided for cache is the value variable. (The cmp and path is not a need for this line)

What is a cursor

Then, rethink what is a cursor - it is an extension attaching on every sub-object of the global single state. The attaching word means, if the state object is going to be GC, then such extension is useless anymore and should be GC too.

Such attaching relationship exactly meets the actual use of WeakMap!

Classes relationship

From the above definition, the cursor cannot hold a reference to any state object. But instead, the state object holds a reference to the cursor.

Concretely, the relationship between the three classes are:

But, of course, the cursor can calculate the state object according to its holding component.state and path. Again, no direct reference.

About the value parameter

In such case, the cursor should not accept the value parameter anymore, both in its constructor and build function.

For backward compatible, the build function can call cmp.setState(value) before pass to Cursor instance. For the Cursor constructor, the value parameter should be deprecated.

Why do we need cache?

Finally, let me back to the origin memory leak issue - cache - why do we need cache?

From the component aspect, the shouldComponentUpdate function needs re-using the cursor instance if its corresponding state object is the same. So, state and sub-state object -> cursor needs cache.

From the cursor aspect, it does not hold the state object directly, but instead needs to calculate it from component.state and path. Each calculation needs O(height-of-state-object) operations. It could optimized to O(1) if using cache. It indicates component.state + path -> sub-state object.

Both of these two caches need WeakMap to avoid strong reference to the state object, and only the state object weakly holds reference to the cursor instance. Once the state object is changed, there is no references holding the original state object and its corresponding cursor instance in the WeakMap is GC too.

Memory issue is resolve! :smiley:

Last issue, the primitive key

The WeakMap has a limitation, its cache key must be a non-primitive type.

For our second case, it can ensure that both component.state and path (an array of strings) are non-primitive objects.

But, for the first case, the state leaf, it always be a primitive object, e.g., String, Number, Boolean, etc. The cache does not work with this types, and always report that, such key not is the cache. So, it always create a new cursor instance for such case.

A new cursor instance will break the shouldComponentUpdate ref-check. But, luckily, because those cursor are actually holding primitive types, shouldComponentUpdate can check the primitive type directly instead of using ref-check to get a right check.

lijunle commented 9 years ago

Lifecycle issues

When implementing the idea, I find a comment:

When refining inside a lifecycle method, same cmp and same path isn't enough. this.props and nextProps have different subtree values, and refining memoizer must account for that

This comment is checked in one year again (61aace7).

In my idea, the cursor does not hold the value directly - it calculates from component when needed. So, in lifecycle mehod like componentWillReceiveProps(nextProps), although this.props.cursor and nextProps.cursor are different objects, but they holds the same component and same path - this.props.cursor.value and nextProps.cursor.value is the same.

Pandora is back. :disappointed:


Another idea - Cursor wraps state object

OK, it is OK. My previous idea resolve memory issue but get another issue back. So, I proposal another way to resolve these.

I draw a relationship for the current implementation:

Root component --this.state--> State object
State object --cursor cache--> Cursor object
Cursor --this.value--> State object

As you see, the state object and the cursor object hold cycle reference which cause memory leak. In the previous idea, I break the cursor -> state object reference to resolve memory leak. But, finally get into trouble.

The second attempt, holds cursor -> state object and break state object -> cursor. In such case, the root component needs to update:

Root component --this.state--> Cursor object
Cursor --this.value--> State object

Then, in the Cursor's update function, instead cmp.setState(nextState), use cmp.setState({ cursor: Cursor.build(this.cmp, [], nextState) }).

Optimizations

We cannot use cache for state object -> cursor instance optimization, otherwise cycle reference happens and memory leaks. So, each Cursor.build call returns a new Cursor instance. Ref-check in shouldComponentUpdate does not works any more.

Fortunately, React immutability helper will try to re-use the original object if it is not changed. So, we can provide Cursor.is(first, second) to check they are holding an object. This API imitates Immutable.is(first, second). The shouldComponentUpdate call leverage this new API and keeps the current public API unchanged.

Re-visit the life cycle methods

Because the cursor instance holds the state object reference. In the life cycle method, e.g., componentWillReceiveProps(nextProps), this.props.cursor and nextProps.cursor holds individual state objects with right values.

No WeakMap anymore

WeakMap is not needed in such pattern. But, of course, it could be cool to store private class variables and other features.

Compare to Immutable.js + setState

If this changes apply, the code will looks very similarly with Immutable.js + setState.

But, with three differences:

  1. The OM (react-cursor) pattern keeps the one single state in the root component. Immutable.js + setState leaves this to user.
  2. The react-cursor provides very-easy-to-use APIs to update state object, hide the calls to setState. Immutable.js suggests a helper function to do such thing.
  3. Immutable.js is strong type with complex type system - Map, Set, Iterable, Seq, etc. Cursor is just a wrapper of native JavaScript object.
lijunle commented 9 years ago

I have implement the second idea in these check-ins. Those are API breaking changes. I might think that you could not like to merge. But if yes, I could prepare a PR.

The mutations example has updated to meet the new APIs. Actually, there are just three lines modifications.

I have use the Chrome heap snapshot to validate that, memory issue is resolved in the mutations example.

dustingetz commented 9 years ago

First of all, the cmp variables inside all Cursor instances are the same. The path variable is used to navigate to a sub-object of global single state (the value variable in code). So, the only necessary variable provided for cache is the value variable. (The cmp and path is not a need for this line)

rootCmp must be hashed for: the case where an app has multiple state roots that are totally independent - can’t share cursor references

path must be hashed for: Suppose a table is sorted, the reconciliation algorithm may choose to not reorder the react components but only fixup their innerHtml

lijunle commented 9 years ago

Sorry, it is a wrong example.

For the rootCmp example, what do you expect for the following code:

var data = { list: [1, 2, 3] };
var id = 4;
var App1 = React.createClass({
  getInitialState() {
    return this.props.data;
  },
  add() {
    var cursor = Cursor.build(this);
    cursor.refine('list').push([id++]);
  },
  render() {
    var cursor = Cursor.build(this);
    return (
      <div>
        <p>There are {cursor.refine('list').value.length} items in the list.</p>
        <button id="click-me" onClick={this.add}>Add one</button>
      </div>
    );
  }
});
var App2 = React.createClass({
  getInitialState() {
    return this.props.data;
  },
  render() {
    var cursor = Cursor.build(this);
    return <ul>{cursor.value.list.map(x => <li>{x}</li>)}</ul>
  }
});
React.render(<App1 data={data} />, document.getElementById('div1'));
React.render(<App2 data={data} />, document.getElementById('div2'));
document.getElementById('click-me').click(); // should App1 and App2 update synchronously?


For the path issue, I still don't get the reason. Revisit it later.

dustingetz commented 9 years ago

We need more time to think about these ideas.

lijunle commented 9 years ago

After thinking, I agree with you that for the current implementation, cmp and path are necessary.

While, my second proposal does not need memorize any more. The memorize code is removed in the check-ins. So, you could take a look.

dustingetz commented 8 years ago

@lijunle, I am finally ready to tackle this issue. Many simplifications have landed in master over the last month, that may help with this issue. Here are the two main leaks: first leak, second.

refToHash can be trivially implemented with WeakMap. This will make the memory leak vastly smaller as we will be retaining one int per reference seen, not the entire react component.

Is there more we can do? The new code has opened new possibilities, I think.

dustingetz commented 8 years ago

Essentially, WeakMap gives us Ref => Cursor, what we need is (Ref, Ref) => Cursor

dustingetz commented 8 years ago

I think we could even do dumb things like a least-recently-used cache in the memoizers with some parameter over how big to make the cache. Tune the parameter based on how big the app is.

lijunle commented 8 years ago

I noticed that there are huge changes these days. I am reviewing and understanding the current implementation for caching issue. Please let you know if I find something.

dustingetz commented 8 years ago

http://stackoverflow.com/questions/34202895/need-weakmap-a-b-c

dustingetz commented 8 years ago

I am considering also adding a special Cursor type for server side rendering, which does not allow cursor updates (which are typically only wired up to user interaction callbacks). This special case only needs to memoize on value (not path) and is thus compatible with WeakMap. However this solution is not fully general as it puts burden on the application code to not use cursor updates in server rendering.

dustingetz commented 8 years ago

From StackOverflow:

if you create a two level weak-map (store weakmaps on weakmaps), whenever a obj on first level is gced, you loose the whole second level (when a is gced, you loose b). if b is gced, you will still have a weakmap for a, which will only be there while there is another (a, something). does this solve your problem or am i missing something? – Walter Macambira 2 mins

lijunle commented 8 years ago

Sorry, @dustingetz I reviewed your changes, it is hard for me to understand the code and predict the behavior. Maybe after you freeze the 2.0 code, I learn the logic and evaluate the usability. Please open/close this issue on your decision.

dustingetz commented 8 years ago

I think we can unit test this with window.performance.memory in chrome, which is already the test runner (thanks @danielmiladinov)

lijunle commented 8 years ago

Great information! Although that API is Chrome only, it could automate the detection and make it accountable!