MithrilJS / mithril.js

A JavaScript Framework for Building Brilliant Applications
https://mithril.js.org
MIT License
14.02k stars 925 forks source link

[rewrite] vnode structure is difficult to statically optimize #1086

Closed tivac closed 8 years ago

tivac commented 8 years ago

There's a few decisions that have been made about rewrite's vnode structure that is making it tricky to update mithril-objectify to work safely & effectively with the rewrite branch.

Current WIP is on the rewrite branch, for reference.

Text

In the rewrite text has a couple of different representations

  1. TextNodes use tag: "#" and set the children property the string of text

    m("div", "a", "b")
    
    {
       tag: "div",
       ...,
       children: [{
           tag: "#",
           ...,
           children: "a"
       }, {
           tag: "#",
           ...,
           children: "b"
       }],
       text: undefined
    }
  2. Other nodes use tag: "<node>" and set the text property to the string of text

    m("div", "fooga")
    
    {
       tag: "div",
       ...,
       children: undefined,
       text: "fooga"
    }

Since it can be difficult to statically determine if an argument is a string both of these patterns are tough to optimize.

An example from the dbmonster example app shows this well.

m("td", ..., db.dbname)

// should become

{
    tag: "td",
    ...,
    text: db.dbname
}

but because it's unknown what type db.dbname is there's no way to optimize it statically.

In mithril@0.2.x this same line could be be optimized to

{ tag: "td", ..., children: [ db.dbname ] }

since the vnode structure for text/nodes was more consistent and bare string children were accepted.

Children and fragments

The change in behavior in rewrite where array children are treated as fragments instead of accepting nested arrays makes it difficult to correctly generate an array of children for various invocations of m().

  1. Arguments convert to an array of children nodes

    m("a", m("b"), m("c"))
    
    {
       tag: "a",
       ...,
       children: [{
           tag: "b",
           ...
       }, {
           tag: "c",
           ...
       }]
    }
  2. A single array argument converts to an array of children

    m("a", [ m("b"), m("c") ])
    
    {
       tag: "a",
       ...,
       children: [{
           tag: "b",
           ...
       }, {
           tag: "c",
           ...
       }]
    }
  3. Mixed array and non-array arguments convert to both fragments/direct children

    m("a", [ m("b") ], m("c"))
    
    {
       tag: "a",
       ...,
       children: [{
           tag: "[",
           ...,
           children: [{
               tag: "b",
               ...
           }]
       }, {
           tag: "c",
           ...
       }]
    }

Not knowing if a particular argument is an array or not makes the creation of the correct vnode structure for these types of invocations tough.

Another simplified example from dbmonster:

m("tr", [
    ...,
    db.lastSample.topFiveQueries.map(function(query) {
        return m("td");
    })
])

// becomes
{
    tag: "tr",
    ...,
    children: [
        ...
    , {
        tag: "[",
        ...,
        children: [ ... ]
    }]
}

Without making a leap of logic that any call to .map() will return an array (something I'm unwilling to do) there's no way to know that the nested array should be a fragment.

In mithril@0.2.x this could be optimized to

{
    tag: "tr",
    ...,
    children: [
        ...,
        [ ... ]
    ]
}

There's probably other instances I'm missing, but these are the two that are most problematic for me right now as I try to do perf testing against the dbmonster application to see if mithril-objectify even deserves a future in our brave new rewrite future.

I'm not necessarily advocating for changing these, especially if doing some would hurt run-time perf. I'm fine w/ accepting that the conversion rate will drop w/ rewrite. I'd just like to bring up these concerns to see if it's something that can potentially be simplified before the rewrite lands in a way that won't affect runtime perf but could improve the ability to statically compile out m() invocations.

CC @lhorie

lhorie commented 8 years ago

So generally speaking, the rewrite branch will probably lose performance if you compile to plain objects, most noticeably in Firefox, which can't figure hidden classes as well as Chrome. Instead, I'd strongly recommend emitting calls to the Node class so that hidden classes can kick in unambiguously.

Something we could experiment w/ is to simplify the signature of the Node constructor. Some arguments are only used in one or other edge case and could be populated via dot notation after node creation instead.

We could also experiment w/ refactoring some of the children/text detection logic into Node so that it's always a dynamic check.

Fragments are much harder to deal with. Frankly, I think you need a full-blown type inference system to be able to deal with them properly, but needless to say, that's not trivial at all.

tivac commented 8 years ago

A bit worried about figuring out how to inject a require("mithril/render/node") call into 3rd party code, we'll see if that pans out.

Do you think those edge cases are uncommon enough that the speed hit from dynamically modifying the hidden class will be worth it?

I'm certainly not interested in dealing w/ the bugs that I know a type inference system would cause, so I'll probably just need to add more draconian heuristics around not optimizing things that might end up being a fragment.

lhorie commented 8 years ago

Here's one example (line 230)

Node("#", undefined, undefined, old.text, undefined, old.dom.firstChild)

Let's say we refactored node signature to Node(tag, attrs, childrenOrText). Then I could change that line to this:

var node = Node("#", undefined, old.text)
node.dom = old.dom.firstChild

This has no impact on hidden classes because the dom field was created by Node. The impact of the extra assignment is also completely negligible (i.e. I did not see an impact even on vdom bench, which is fairly sensitive to changes in the hot path)

The idea here is that Node would be a lot less annoying for you to use, and it would encapsulate the logic to figure out whether children should be an array or a text field

tivac commented 8 years ago

Oh, I misunderstood. Node() (still should be lower-cased right?) would create the dom field but just not expose it via the function arguments. Works for me.

I can investigate that in a bit, need to step back from this problem for a few and let my eyes uncross.

dead-claudia commented 8 years ago

@lhorie @tivac What about vnode.state? That one in particular is exceptionally difficult to optimize, and the only way you're getting that fast in it stands is through the infamous toFastProperties hack for V8 only, or by calling oninit within the state's constructor itself like below:

function State(vnode) {
    vnode.state = this
    vnode.tag.oninit(vnode)
}

// You'll have to call it with `new` each time
lhorie commented 8 years ago

@isiahmeadows I'm not sure what you mean. The state property is already part of the object structure from the beginning, so it's not gonna deopt the vnode afaik

dead-claudia commented 8 years ago

@lhorie

When the vnode's state is mutated in oninit, which is very common for stateful components (like forms and interactive content), the state itself could easily become deopted (I believe the threshold is 4 mutations, which would show almost instantly), and people aren't going to like that for frequently replaced and/or rendered components.

dead-claudia commented 8 years ago

To clarify, it's not uncommon to see code similar to this:

oninit: function (vnode) {
    vnode.state.counter++
},
view: function (vnode) {
    return [
        m(".counter", vnode.state.counter),
        m("input", onclicjk:  function () {
            vnode.state.counter= 10
        }),
}
lhorie commented 8 years ago

Ok, but that's unrelated to the topic of making a vnode tree easier to statically analyze.

Re: fast properties, the cost of the toFastProperties hack would have to offset the cost of hashmap lookups. I think it would be very difficult to come up w/ a representative A/B test to figure out whether to apply the hack or not.

dead-claudia commented 8 years ago

@lhorie I see it as indirectly related, but you're right in that particular problem belongs elsewhere.

I have another question: would it be possible to make the internal tree carry a different type for its entries than the external API?

By internalizing the representation, you've both simplified the external API (and @tivac may appreciate that), and exposed less of the implementation, which allows much more internal freedom for various changes and optimizations, including after it's released. This would reduce memory usage by a lot, since you're diffing the same data either way, but you're not creating nearly as much after the first redraw to later compare, which is much easier for GCs to handle (they like smaller, more static objects). Also, for the hyperscript API, most of this is waste.

lhorie commented 8 years ago

@isiahmeadows Sorry, I don't understand your question. I also don't understand what exactly would reduce memory usage and I don't understand what part of the Node structure is "waste", given that every field is used for one use case or another.

Can you clarify?

dead-claudia commented 8 years ago

@lhorie

I mean that, in the context of the hyperscript API, all you need are tag, key, attributes, and children. You could include state: null for memory/performance reasons, but the rest like dom, text, and the instance added here and here, are generally useless at first, and are more or less internal implementation details.

dead-claudia commented 8 years ago

In the case of text nodes and raw HTML nodes, you only need two properties: tag and children. The other two properties even referenced in either, dom and domSize, are implementation details.

lhorie commented 8 years ago

As I mentioned in the chat, the omission of fields in the Node structure are oversights. Unless something changed, my understanding is that having the extra fields is faster and more efficient than not having them (because we can take advantage of monomorphism if all the vnodes have the same structure)

dead-claudia commented 8 years ago

@lhorie

I get the idea of monomorphism. And here's my idea of how it all could be handled:

// Internal representation
interface ComponentState {
  vnode: VirtualNode;

  // For SVG elements - they don't appear to propagate through component
  // boundaries ATM, if I'm reading the source correctly
  svg: boolean;
  dom: Node;
  domSize: number;
  events: {[key: string]: EventCallback};
}

// What the API passes in
type VirtualNode =
  ComponentNode |
  TextNode |
  TrustedNode |
  FragmentNode;

// m(Component, ...)
interface ComponentNode {
  tag: Component;
  attrs: {[key: string]: any};
  children: VirtualNode[];
  key: string | number;
  state: Object | null;
}

// m("div", ...)
interface ComponentNode {
  tag: string;
  attrs: {[key: string]: any};
  children: VirtualNode[];
  key: string | number;
  state: Object | null;
}

// "text" in m("div", "text")
interface TextNode {
  tag: "#";
  children: string;
}

// m.trust("text")
interface TrustedNode {
  tag: "<";
  children: string;
}

// fragments
// m("[", "text")
interface TrustedNode {
  tag: "<";
  children: VirtualNode[];
}

First, monomorphism doesn't simply work by only calling with one specific type. It actually deals with only the objects that can be referenced within the code path. In the engine's eyes, these two objects would be considered monomorphic when passed to value, because their value value is never actually used:

var obj1 = {
  left: true,
  value: {one: 1},
}

var obj2 = {
  left: false,
  value: {two: "two"},
}

function func(value) {
  if (value.left) {
    withLeft(value.value)
  } else {
    withRight(value.value)
  }
}

console.log(func(obj1)) // true
console.log(func(obj2)) // false

Also, this function would be seen as monomorphic, because for each code path, value is only one type. typeof doesn't introduce polymorphism in this case, because the engine is intelligent enough to know to narrow the possible types for value within that code path.

function toInt(value) {
  if (typeof value === "string") {
    return parseInt(value, 10) | 0
  } else {
    return value | 0
  }
}

var a = toInt(1)
var b = toInt("1")

Let's just say I did a recent bout of that with my WIP test framework, to optimize initialization and running the tests. Currently, it has a more complex initialization process than Mocha by necessity, and it's predictably slower, which makes optimization very helpful.

dead-claudia commented 8 years ago

Oh, and to be fair, I haven't been on Gitter for a while.

lhorie commented 8 years ago

I see, that's interesting to know.

For now, though, I'm not sure if this type of optimization is worth the effort. I did a quick benchmark and the cost of having extra properties in an object is in the range of +1ms per ~8 properties per 200,000 iterations, which is well into micro-optimization territory, so it's not really worth a large refactor of the current codebase.

dead-claudia commented 8 years ago

@lhorie My two main reasons were actually memory and encapsulation of the API, where it would do wonders. Speedwise, it may make marginal difference (mod maybe GC), but half the properties would likely result in two thirds the memory usage, which would be highly beneficial for larger and highly interactive projects, where memory usage and GC speed would both be important.

tivac commented 8 years ago

Might be worth reworking to use the Node constructor, might not. Until I have time to investigate this thread isn't moving, so I'm closing it.

pygy commented 8 years ago

A bit late to the party (I hadn't visited that part of the code when this was originally discussed), but I think that the text optimization could occur at diff time rather than at vnode generation time.

You'd have to mutate the current vnode so that the optimization is remembered on the next diff phase, once vnode has become old... It would be slightly more expensive in that you'd either have set the children field to undefined or keep the children array around for the lifetime of the node (it is generated anyway), rather than leaving it available for GC at the end of the m() call.

futurist commented 7 years ago

Not very related to this issue itself, but share a similar situation.

When I write this lib:

https://github.com/cssobj/cssobj-mithril

I'm in the hope of call some internal methods of mithril (1.x have oninit, that's very good!), and decide what to do.

m('li.item.!active', {class:'news !active'}, 'hello')

Want to transform above code into:

m('li.item_suffix_.active', {class:'news_suffix_ active'}, 'hello')

I want it work on both 0.2.x and 1.x, I found several ways doing this:

  1. (runtime) repeat the selectorParser of hyperscript function's work, transform the desired part, and call mithril again.

  2. (statically) Use babel

Both way have more work, if mithril can expose some internal methods, or provide some 'plugin' mechanism (hook into selector parser, etc.), that's easier to do such a thing.