Diffing an actual DOM Node tree with a virtual DOM tree

patrick-steele-idem commented 9 years ago

I'm interested in exploring this module some more to see if makes sense to update the Marko template compiler to support compiling to a program that produces a virtual DOM tree in addition to and as an alternative to a program that produces a stream of HTML strings (which will likely always be faster on the server. See: https://github.com/patrick-steele-idem/marko-vs-react). Marko already uses an HTML parser to build its AST so the only thing that really changes is the code used during the code generation phase and that can be abstracted to easily support both output types.

My question is: Does it make sense to support diffing a virtual DOM tree with an actual DOM tree? The reason I ask is because we produce a DOM as part of rendering the HTML on the server, but we need to compute client-side updates by building a virtual DOM. Why not diff the virtual DOM tree with the existing actual DOM tree to produce the set of patches needed to update the actual DOM? That is, after the initial rendering on the server all we have is a DOM (not a virtual DOM). When we re-render the UI on the client we would have a virtual DOM. To update the actual DOM we need to diff the new virtual DOM with the existing actual DOM and then patch the actual DOM. In addition, diffing a Virtual DOM with an existing DOM will, in theory, be faster since the diffing and patching could happen simultaneously since the actual DOM nodes would be easily accessible during diffing and that would allow the actual DOM to be quickly patched.

As a related question, are there any benchmarks that show that reading the virtual DOM is faster than reading the actual DOM. It seems really wasteful to have to keep a virtual DOM in memory just for future diffing given that the virtual DOM should be an almost identical representation of the actual DOM.

I hope this makes sense. I have yet to take a deep look into the implementation of this module (something I hope to do in the near future), but I just wanted to get an understanding why diffing only appears to be supported between virtual DOM trees and not between an actual DOM tree and a virtual DOM tree. I have not seen any benchmark that shows that reading the DOM is slow for simple things like node.tagName, node.getAttributes(), node.firstChild and node.nextSibling. I can't imagine reading the DOM being slow given that the DOM is implemented in optimized compiled code and the DOM is an integral to the browser. I can, however, understand how reading layout-related properties of the DOM (e.g. el.offsetWidth) can be slow if it forces a relayout, but there is no need to read those types of properties for the purposes of diffing/patching.

Please let me know if I am missing anything. Just wanted to start a discussion and get thoughts from others.

Thanks in advance.

Zolmeister commented 9 years ago

See https://github.com/marcelklehr/vdom-virtualize (example: https://github.com/Raynos/mercury/tree/master/examples/server-rendering)

patrick-steele-idem commented 9 years ago

Thanks for sharing, @Zolmeister. My initial thought when looking at vdom-virtualize is that it would be an unnecessarily inefficient solution to produce a virtual DOM from the actual DOM just for the purpose of diffing and patching. It would be better to avoid the intermediate representation. Also, I feel like the patching will be much faster if you have access to the actual DOM nodes during the reconciliation phase.

Any thoughts on the performance difference of reading the actual DOM versus the virtual DOM?

Zolmeister commented 9 years ago

No idea. My initial guess would be that it performs worse due to the c++ boundary (even if it's much faster to read than write), and also keep in mind that it would need to support both (for server-side diffing).

I would recommend writing a jsPerf example to test your hypothesis.

patrick-steele-idem commented 9 years ago

I mostly agree with your thoughts. Going across the C++ boundary might be an issue and it should be measured, but I suspect that the cost would be minimal and that cost would be less than the cost of maintaining a virtual DOM in memory. I'm just surprised that I can't find any existing benchmarks that show that reading the DOM is slow despite the common claim that it is slow. I think people are misinterpreting the fact that is much slower to render to a temporary actual DOM instead of a temporary virtual DOM since actual DOM nodes have a ton of baggage.

Also, I don't agree with the following statement:

keep in mind that it would need to support both (for server-side diffing)

What I am proposing is that that reconciliation always be done between the actual DOM tree and the newly rendered virtual DOM tree. Assuming reading the actual DOM is fast (and it should be) then there is no need to ever do a diff between a virtual DOM tree and another virtual DOM tree. Once a newly rendered virtual DOM tree has been reconciled with the actual DOM it can then be tossed out since there is no longer a need for it.

Zolmeister commented 9 years ago

Yes, you are correct about the reconciliation part

neonstalwart commented 9 years ago

@patrick-steele-idem some of the mechanisms you can use with virtual-dom cannot be inferred by looking only at the DOM. e.g. you can't tell if something is a widget or has a key, etc just by walking the DOM; you need the virtual-dom tree to determine that. if you wanted to give up those things which can't be inferred from the actual DOM then all your questions still stand.

the next thing is that the shape of a virtual-dom node (i.e. it's properties) is not interchangeable with a real DOM node. so, at a minimum, if you only had the DOM, you would need to walk the DOM to build a tree of vnodes before you could do a diff or build your own customized diff that understood how to diff the 2 representations.

i took a very quick look at your benchmarks comparing marko to react and you may be interested to find that http://vdom-benchmark.github.io/vdom-benchmark/ shows that react is really quite slow (for those tests) compared to the other libraries being benchmarked. take it with a grain of salt though since those benchmarks keep the nodes hidden as they are rendered.

you have a lot of questions and i know i've really just touched on a few - for your other questions, i either don't have adequate answers (e.g. i don't know of any benchmarks comparing reading DOM vs virtual-dom) or the answers may take some significant discussion (e.g. is it an option for you to send either the initial tree from the server or derive the initial tree on the client from some initial data from the server?) so take a look at the code here a bit more and keep the discussion going.

algesten commented 9 years ago

@neonstalwart whilst it may not be a good idea, couldn't it be possible to use data-attributes to store extra information in the DOM-nodes – even if that is just a lookup key into some map of further data?

<div data-vnode-key="key42" data-vnode="abc123">...

vnodes = {...}
vnodes.abc123 ...

neonstalwart commented 9 years ago

@algesten sure, you might be able to do that if it suits your purposes and you want to build a layer that interfaces with virtual-dom. this is more intrusive than we might like to be for a low-level library like virtual-dom. we try to minimize our impact on the way the user might like to use the DOM so even something as fairly well-accepted as owning some data-* attributes is more intrusive than we'd like to be.

also, my challenge to you would be that maybe the "some map of further data" could be redundant if you instead start with the server seeding the page with some initial data + a function to generate the tree from the data. my thought is that you already need a function to generate a tree from some data for future rendering and you likely need the initial data already for more than just generating a view (i.e. you probably already have a need for this data for your model/state). generating the initial vdom tree from that initial data is probably a similar cost as walking the DOM in order to generate an initial vdom tree to do the first diff so it may actually cost more to build a layer that interfaces with virtual-dom using data-* attributes.

of course this is all just supposition without any substantiation so i'd be curious to see some data to support or refute what i'm suggesting... but not curious enough (and more importantly, don't have enough time) to do this myself right now.

patrick-steele-idem commented 9 years ago

While it might feel wrong, I see now harm in adding custom namespaced properties directly on DOM nodes to support diffing. For example:

domNode.vdomMeta = {...};

Doing so will not introduce memory leaks since those properties will be garbage collected when the DOM node is garbage collected and it is less wasteful than maintaining the entirety of the previous virtual DOM tree in memory (especially since there is so much overlap between the properties of the actual DOM nodes and the properties of the virtual DOM nodes). My theory is that there will be significant gains if all of the diffing was done between a virtual DOM tree and the actual DOM tree. This would allow optimizations for both the use case where the initial HTML is rendered on the server and for the use case where updates are rendered on the client to a virtual DOM and need to be reconciled with the actual DOM.

I'm going to continue to investigate and I hope to create some more benchmarks to substantiate my theories. This is a great discussion, and the great thing is that we are all working towards the goal of making a the view a function of some state.

algesten commented 9 years ago

This is a great discussion, and the great thing is that we are all working towards the goal of making a the view a function of some state.

I'm trying to do views as functions using virtual-dom http://algesten.github.io/trifl/

(end of shameless plug)

localvoid commented 9 years ago

I'm going to continue to investigate and I hope to create some more benchmarks to substantiate my theories.

Don't waste your time, it will be significantly slower. It is also one of the main reasons why virtual-dom is slower than kivi, bobril and cito.

algesten commented 9 years ago

@localvoid what are you talking about? virtual-dom isn't doing comparison with the real DOM. what is this "main reason" you speak of?

localvoid commented 9 years ago

@algesten It traverses through the real DOM when applying patches.

patrick-steele-idem commented 9 years ago

@localvoid, just to make sure I understand correctly, are you saying that virtual-dom is slower because it is not diffing with the actual DOM?

virtual-dom always runs a diff between two virtual DOM trees and patches are applied as a separate step to the actual DOM. What approach is kivi using exactly?

neonstalwart commented 9 years ago

@patrick-steele-idem the claim is that virtual-dom is slower because it walks the DOM when patching. the DOM walking is to find the nodes that need to be patched according to the instructions given by the patch.

an alternative approach is to store the DOM node as a property of the virtual-dom node and then there's no need to walk the DOM to find the real node when it comes time to patch. the downside of this is that you cannot reuse virtual-dom nodes like in the following:

var saveButton = h('input.save', { type: 'submit' }, 'Save');

// a form with a save button at the top and bottom of the form
return h('form', { 'ev-submit': events.submit }, [
    saveButton,

    // a whole lot of form elements go here...

    saveButton
]);

neonstalwart commented 9 years ago

...the implication of what @localvoid is saying is that if you want to diff against the DOM, you are going to have the same (or worse) overhead simply because walking the DOM is what is making virtual-dom slower than the other libs mentioned.

patrick-steele-idem commented 9 years ago

Thanks for the clarification @neonstalwart. I feel like reusing virtual-dom nodes is a corner case and it would be better to optimize for the normal case.

patrick-steele-idem commented 9 years ago

I feel like if reconciliation (i.e., diffing and patching) should be done in one traversal and that it should be done by comparing virtual DOM nodes with actual DOM nodes (possibly with some extra metadata being attached to actual DOM nodes to speed up diffing). I could be wrong and that's why I am trying to understand the approaches that various solutions have used.

algesten commented 9 years ago

The problem is also that we're comparing apples and pears. The level of abstraction in kivi's API for creating virtual DOM-nodes is significantly lower than that of virtual-hyperscript. I suspect a lot of overhead would be added if it was raised to some level of user friendliness. The same can be argued for cito.js

var Hello = vdom.declareComponent({
  build: function() {
    var root = vdom.r();
    root.children = [vdom.t('Hello ' + this.props.name)];
    return root;
  }
});

Raynos commented 9 years ago

Virtual DOM is designed to diffs on two virtual trees. This has various advantages

You can do a diff on the server without a DOM and use a different vdom.createElement() strategy; namely creating HTML strings
You can do a diff on two virtual trees which represent something that is not DOM nodes (i.e. doing a diff on virtual trees of ANSI text data structures and then implement patch() to update a TTY).
You can test a large portion of your code without having any DOM involved. You can write unit tests on virtual DOM data structures that run everywhere, including node.
You can re-use the diff implementation on virtual trees that represent 2D canvas or 3D webgl.

neonstalwart commented 9 years ago

virtual-dom has so far considered reusing vnodes to be a feature of the library rather than an edge case. being able to reuse sub-trees from a previous render that can be determined to have remained the same across renders can improve performance. some of what makes virtual-dom unique is not represented with the vdom benchmarks - widgets, thunks, etc.

we are not opposed to optimizing the way we achieve our goals but we would need to think long and hard about giving up some of the qualities we currently have. yes, there is a price for these and so ultimately you need to determine which set of trade-offs best suits your needs when choosing a library.

patrick-steele-idem commented 9 years ago

Thanks for sharing your thoughts @Raynos. I may not be fully understanding some of your design goals, but here are my comments:

You can do a diff on the server without a DOM and use a different vdom.createElement() strategy; namely creating HTML strings

Interesting, but I am not able to see how that could be helpful in practice given that it would require the server to have knowledge of all of the client-side state changes for a diff to make sense. I suppose you could make that work.

You can do a diff on two virtual trees which represent something that is not DOM nodes (i.e. doing a diff on virtual trees of ANSI text data structures and then implement patch() to update a TTY).

Again, interesting, but that seems like a pretty big stretch.

You can test a large portion of your code without having any DOM involved. You can write unit tests on virtual DOM data structures that run everywhere, including node.

I'm not arguing against having a virtual DOM. The virtual DOM would still be the output of rendering and tests could be written against the output virtual DOM and those tests could run without an actual DOM. I'm only exploring the option of reconciling the virtual DOM directly against the actual DOM.

You can re-use the diff implementation on virtual trees that represent 2D canvas or 3D webgl.

Updating a 2D canvas requires a redraw so you might as well redraw the entire canvas if any of the view state changes. It seems like a mismatch to associate a 2D canvas or a 3D webgl with a DOM-like tree structure.

neonstalwart commented 9 years ago

@patrick-steele-idem the points from @Raynos can be mostly generalized into one point which is that vnodes, diff, and patch are intentionally decoupled in virtual-dom so that they can be used with alternative counterparts. i.e. you can produce a custom diff (e.g. one that diffs a virtual tree against the DOM :wink:) and still use our patch and likewise for using diff with a custom patch.

patrick-steele-idem commented 9 years ago

Makes perfect sense @neonstalwart and it is great that those things are separated out into separate sub-modules so that others can pick and choose without introducing a lot of baggage. This conversation started because I wanted to understand why there was no built-in support for reconciling a virtual DOM tree with an actual DOM tree. I still believe it is a gap that should be explored further. I've gotten enough input for now to go ahead and close this issue.

Thanks, all, for the discussion.

Raynos commented 9 years ago

@patrick-steele-idem

https://github.com/Matt-Esch/virtual-dom/wiki

If you write any new cool modules add them to the wiki :D

jails commented 9 years ago

@patrick-steele-idem the alternative you were talking about:

domNode.vdomMeta = {...};

will be slower even if it's just 1 property to set on a DOM element.

To support event delegation out of the box on dom-layer, I needed a way to retrieve the virtual node attached to a DOM element (because the event callback to execute was stored in the virtual node). So I did the following for every DOM elements.

domNode.vnode = <reference of the virtual node>;

I didn't make any precise benchmark but it significantly slow down the rendering part. Enough to change my strategy and only attach the vnode reference for DOM element which has defined events (to minimize overheads).

Matt-Esch / virtual-dom

Diffing an actual DOM Node tree with a virtual DOM tree #203