Yomguithereal / baobab

JavaScript & TypeScript persistent and optionally immutable data tree with cursors.
MIT License
3.15k stars 117 forks source link

First-class reference cursor implementation to improve performance #404

Closed atifsyedali closed 8 years ago

atifsyedali commented 9 years ago

Monkey's are great. I find myself using them quite a bit. However, my usage is always about making a reference to some centrally stored data.

For example, let's say I have the following tree

data -- messages ---- messageA ------ title: "This is message A" ---- messageB ------ title: "This is message B" UI -- messageListView ---- messageViewA ------ message: monkey to /data/messages/messageA ------ someUiState: true ------ someOtherUiState: false ---- messageViewB ------ message: monkey to /data/messages/messageB ------ someUiState: false ------ someOtherUiState: true -- NotificationBar ---- newMessagesView ------ message: monkey to /data/messages/messageB

From the above, you can see that two UI widgets can share the same message messageB and still have messageB stored in one place. This is great because now I only have to modify messageB and all UI updates accordingly.

However, it turns out that messageB gets copied into the UI state. I can get around this by storing a version field in messageB that gets updated whenever messageB is updated:

data / messages / messageB -- title: "This is message A" -- version: 42

And then monkey the version field in messageB instead...

UI / messageListView / messageViewB -- message: monkey to /data/messages/messageB/version -- refPath: ["data", "messages", "messageB"]

Now, whenever I need the messageB in my messageViewB, I simple do:

var path = messageViewCursor.get('refPath'); var message = tree.get(path);

This approach has two advantages:

  1. I don't need to monkey the entire message AND my views reactively update whenever a message gets updated (assuming message here is an atomic UI component ).
  2. I get a modifiable copy of message in my view (monkey data is not modifiable).

However, this approach also has disadvantages:

  1. This approach does not work across the parent hierarchy unless you also update the version number for every parent until the root. For example,

data -- version: 123 -- messages ---- version: 456 ---- messageA ------ version: 789

If I am monkeying data/version, then modifying data/messages/messageA does not update references, unless you recursively change version up the chain. Why does this matter? For example, a messageViewList can have a reference to a messages array, and each item messageItemView in messageViewList references a message in the messages array. If a message changes, the messageItemView queues for a re-render, but the messageViewList does not detect any change. In systems like React, this will effectively mean no re-render at all since messageViewList is the parent that does not rerender.

2 It's difficult to manage all these version fields yourself in the application layer, especially since there are implicit writes when some parts in the supplied path do not exist.


It would be really neat to have this functionality natively in Baobab. This might mean adding some hidden fields or maintaining a separate structure internally than the input.

Yomguithereal commented 9 years ago

Hello @atifsyedali. Let me read you and see what can be done :smile:.

Yomguithereal commented 9 years ago

I am not sure I understand what you mean by:

However, it turns out that messageB gets copied into the UI state.

Yomguithereal commented 9 years ago

Sorry for closing. I misclicked.

atifsyedali commented 8 years ago

Sorry, I think I misunderstood a performance problem I am having by thinking entire monkey data is cloned (I have persistable=false).

The performance problem I am having is when there is a monkey in each item in an array. This is quite slow for some reason and I thought it might be because the entire monkey data is being copied. I am no longer sure if this true tho.

Yomguithereal commented 8 years ago

Normally, monkeys within an array should not even work. Anyway, the data returned by the monkeys is never cloned, just frozen if the tree is immutable.

atifsyedali commented 8 years ago

Monkeys within arrays actually work if you have a monkey nested 2+ levels below...e.g.

array: [
   {
       parent: {
            child: monkey {..}
       }
   },
   {
          ...
   }
]

I have removed the arrays from my code and that reduced my page load time from 9s to 5s. However, I still notice cursor.set(..) taking a 3.5s for me. I have 940 monkeys when the page is fully loaded (I have to load a lot of inter-related data). That's still more than half my page time spent in setting values.

Yomguithereal commented 8 years ago

Well I am surprised this even work because I remember shunting monkeys when dealing with arrays. I must have missed something somewhere. However, your use case is indeed a tricky and cannot be performant. Why do you have so many monkeys? Is your code available somewhere we could check it together to see if something can be done about it or the lib?

atifsyedali commented 8 years ago

Unfortunately it is closed source. For now I am reducing the amount of monkeys being used. But is there some performance improvement that you think can be made w.r.t monkey handling? For example, I noticed all monkeys are listening on the entire tree write (maybe I am wrong) --- perhaps a monkey can listen to only the cursors it cares about?

Yomguithereal commented 8 years ago

Monkeys already listen only to the cursors it cares about. The thing is to do so (and this is also how cursors work under the hood) they have to listen to the tree to understand whether its updates affect them. There is another way to code the internals of the library that could enhance performance in your precise use case but that would be nefarious for general use cases. But generally, I wouldn't vouch for having a lot of "dynamic" monkeys. You seem to use them as class getters and this is not a good use case for reducers. What exactly is the job of your monkeys here?

atifsyedali commented 8 years ago

I am using monkeys as references to centralized data in my views. This allows me to build complex views in React that can render quickly without having to explicitly pass down data from root components. So far, it's worked well. However, as I load real-world data, I started to notice performance problems. Digging a bit more, it turns out it is not just monkeys, it is any object that I create has slow performance. I think this goes back to using more efficient persistent data structures like mori or immutable-js.

I switched over to mori entirely (wrote cursors on top of mori with referencing ability) and saw a dramatic performance improvement. Closing this defect because I think I wasn't using the right tool...Baobab is still excellent but I don't think it was meant for large amount of data.

Yomguithereal commented 8 years ago

You are right @atifsyedali. Baobab is meant to be halfway between immutable/persistent data structures and raw JavaScript object so that one may read the tree's data easily and without requiring components to deal with more constrained APIs such as mori or immutable-js' ones. But if you need to deal with vast amount of data that will change fairly fast then yes, you should definitely switch to them.

If I can bother you some more, did you find specific points in Baobab that were particularly not performant? You seem to have had quite an experience with things people don't tend to develop very commonly and your insight is therefore very valuable on the subject.