adriancooney / voyeur.js

Voyeur is a tiny (1.2kb) Javascript library that lets you traverse and manipulate the DOM the way it should have been.
http://adriancooney.github.io/voyeur.js
MIT License
731 stars 116 forks source link

Fundamental performance issue. #20

Open adriancooney opened 11 years ago

adriancooney commented 11 years ago

Voyeur's performance is bad. It's so bad that Voyeur should never be used in production. It's manages just over 11,000 operations per second on it's most basic traversal in comparison to jQuery's 300,000 operations per second.

The problem is not something I can fix (without changing Voyeur's core values), it's got to do with Object.defineProperty's performance which is absolutely shit. Here is the performance difference of just setting the children's to a property on a node instead of an Object.defineProperty getter function. __defineGetter__ returns similar results.

screen shot 2013-07-10 at 23 37 53

The way I see it, I have two options.

  1. Change the syntax. I feel the whole appeal of Voyeur is it's syntax. From the feedback I received, the ability to traverse the DOM like a Javascript object really seemed to resonate with developers so taking this option would defeat the purpose of Voyeur.
  2. Freeze the project until the performance is acceptable. This may seem like a cop out but I can see no other option that will allow Voyeur to maintain it's uniqueness. I'm not the only one who has noticed the Object.defineProperty performance so I think putting the project on a halt until the various engine's catch up and the performance of Voyeur is satisfactory is the right way to go.

I'm going to go with option 2. I have left a visible warning on the README. Voyeur can be used for trivial projects and in the console but it should never used in production, under any circumstances.

I hope this message doesn't come across as spiteful or sour, that's completely unintentional. Sorry guys.

Comments and suggestions welcome!

maxov commented 11 years ago

One thing that might be possible is a compromise: retain the dot-syntax for transversing the DOM(where most of the cruft is saved), but retain a jQuery-esque API for actually manipulating DOM nodes. This way we can build a tree of objects separate from the DOM without using Object.defineProperty and possibly even get slightly higher performance for some simple queries(nesting) vs jQuery selectors. Here's what I mean(took this from the example):


Voyeur.div.header.h1.em.a.href = "http://google.com" // becomes ->
Voyeur.div.header.h1.em.a.href("http://google.com")
// now we can have these methods separate from the DOM objects

Voyeur("#title").em.a.innerText("New title!")
// .find can be replaced with a functiont(or not, this is a really small thing)

// .use is essentially the same
Voyeur.div.section.ul.li.use(function(li, i) {
    li.a.innerText("Link #" + i)
});

Voyeur.div.section.ul.li.eq(3).classList.add("Highlighted!")

I hope by this point you get what I mean. We have the brevity of Voyeur mixed with the speed of jQuery. And the concept of traversing the DOM with property access(amazing!!!!) is the same. If you think this goes against the project philosophy(which, it kind of seems like it does), that's fine, I'll write a fork or something and benchmark it.

nbubna commented 11 years ago

Does DOM traversal/selection really happen often enough to be a bottleneck? Even when it is, shouldn't caching results (e.g. var divs = HTML.find('#foo').div;) help reduce repetition considerably? Also, jQuery's selection usually happens in native code via querySelectorAll, so for deep traversal with Voyeur (Voyeur.div.div.div.div) you are doing a lot more work along the way than Voyeur.find('div div div div'). So when performance matters, use find() and cache results, when it doesn't feel free to walk the DOM down.

Apart from smarter use of Voyeur's features, i wonder if it is really Object.defineProperty that is so slow or if it is the fact that Voyeur's technique relies upon a "lookahead" at the children of each node it prepares for use. When jQuery wraps node(s), it's cheap because it merely wraps them in a new object with a rich prototype and does no action upon the node itself. Voyeur, however, must immediately map the children out as new properties. Using the getter, it avoids traversing the whole tree, but still, looping over the children of a node will always be much slower than simply wrapping it in an object.

I guess, i'm not so sure that even improved native defineProperty performance will help Voyeur much. But of course, i'm also far from convinced that this is a performance critical feature. I can't think of a use-case where so much traversal/selection would happen that it would really matter which technique was used, even on mobile. And when it does matter, there are workarounds or other libraries. I think it is an overreaction to freeze Voyeur development simply because it fails to match jQuery performance.

And, for that matter, i've not frozen my work on it. In fact, i've taken it pretty damn far in my fork (http://nbubna.github.io/HTML). At this point, it's a competitor and total rewrite (except the gh-pages branch, which still has much in common), but i think you'll be encouraged by where the technique can go. I think it is time to ditch DOM wrappers like jQuery and start working with the DOM directly. Cutting out lots of wrapping API saves a lot of KB, and what i've managed with my each() function is lots of fun. :)

tomByrer commented 11 years ago

http://jsperf.com/voyeur8000/3 incase someone wants to test the performance themselves.

nbubna commented 11 years ago

That getElementsByClassName is so much faster it makes the results of the rest hard to read, and the rest is the comparison i'm most interested in. So, i give you: http://jsperf.com/voyeur8000/5

Looks like i managed to speed things up in my fork!

nbubna commented 11 years ago

Been poking around jQuery to see how they get their speed at this task. It looks like they get it on the simple selector by identifying that they can use getElementsByClassName(). Clever, though i'm still not convinced that selection is a bottleneck that deserves much code/complexity to get a few extra thousand selections per second. Would love to hear about a use-case that proves me wrong, of course. :)

tomByrer commented 11 years ago

I was hoping you'd throw your version in the fray Nathan! Congrats on the speedups! Seems to break on IE8 (still 8% overall, but some have higher use %).

Would love to hear about a use-case [for faster speed] that proves me wrong, of course. :)

  • On my Sprint Note2 (which has a decent beefy CPU) I get ~+90% faster with jQuery than your HTML.js... though you are almost as fast as NoLIbary.
  • sever/Node.js needs all the speed it can get with many users
  • large tables/datasets
  • animation/looping

Taking a quick glance at your code, not sure how to speed things up other than experimenting swapping .forEach with regualar for or while loops if you can.

nbubna commented 11 years ago

Yeah, HTML definitely requires IE9 and up. You can keep it from breaking in IE8 with a shim (i highly recommend http://github.com/kriskowal/es5-shim), but you can't get the dot-traversal syntax in IE8, because Object.defineProperty can only be shammed, not shimmed.

I don't see how your use-cases have anything to do with DOM selection/traversal, specifically. Remember what DOM selection is, it is the finding of an element or set of elements, not the interacting with them. If you are repeatedly doing that finding in a loop or high-performance situation, you are doing it wrong. Even speedy jQuery preaches against such things in all best-practices/optimization advice out there. Once you have found the elements you are working with, you reuse them.

And again, i looked at jQuery's trick, they test the selector to see if they can use getElementById or getElementsByClassName, then use those instead of querySelectorAll. That's not a hard trick to pull off, but i'm needing some convincing that DOM selection is an issue. I would be more inclined to work on speeding up a workhorse method like my each() implementation, than my find() implementation.

nbubna commented 11 years ago

Oh, more on the IE8 thing: i've got green light to do most of this work on HTML on company time, but we no longer accept Windows XP (and IE8 by extension) as first-class browsers. We try to keep our apps from being completely unusable on old IE, but do not try to keep them feature complete, and definitely aren't working to keep IE8 support in our front-edge library/tool development. It's officially far more important to spend our time/money supporting mobile form factors and features than the rapidly diminishing IE8 for all new development. If i get time myself, i may work on old IE support, but not on company time. :)

tomByrer commented 11 years ago

IMHO it is better to move forward & optimize for at least mobile browsers (guessing 20% of traffic now?).

But as FYI, IE8 is installed by default with Win7. Many older people who don't have a geek help them with their computer end up keep using it. Occasionally webdevs are forced to be far-backwards compatible because of undermanned IT shops. I know a dev I talked to 2 months ago forced to program for IE6, for public schools around Denver.

Once you have found the elements you are working with, you reuse them.

Good point; I guess just ignore my prior comment for now & I'll keep an eye out for specific use cases that might look up the DOM more than a dozen time.

On Thu, Jul 25, 2013 at 11:12 PM, Nathan Bubna notifications@github.comwrote:

Oh, more on the IE8 thing: i've got green light to do most of this work on HTML on company time, but we no longer accept Windows XP (and IE8 by extension) as first-class browsers. We try to keep our apps from being completely unusable on old IE, but do not try to keep them feature complete, and definitely aren't working to keep IE8 support in our front-edge library/tool development. It's officially far more important to spend our time/money supporting mobile form factors and features than the rapidly diminishing IE8 for all new development. If i get time myself, i may work on old IE support, but not on company time. :)

— Reply to this email directly or view it on GitHubhttps://github.com/dunxrion/voyeur.js/issues/20#issuecomment-21601873 .

nbubna commented 11 years ago

In the short run, excluding IE.old limits the developer audience slightly but retains my sanity. I'm competing with jQuery's future, not its past. Web devs forced to deal with IE 6/7/8 should thank the lords of the intertubes that jQuery 1.x exists and never deviate from it. I know i'd go running back there myself the second someone pressed for IE<9 support.

tomByrer commented 11 years ago

^ great way of putting it Nathan!

adriancooney commented 11 years ago

Sorry for the lack of response on this thread. Haven't had the time recently to give a worthy response.

@gratimax Having a function returned instead of a node would unfortunately have no effect on the performance because you'd still require the Object.defineProperty to expand a node's children unto itself. Infact, that was actually the original idea for the syntax. I eventually had to decide whether I wanted a node or function returned. I spent quite a while deciding between the two and finally decided on the node. The reasons I chose it because firstly, the function added (albeit small) an extra step between you and the node and secondly, I'd have to introduce a wealth of extra functions to enable the same flow as the dot notation. I understand where you're coming from though and appreciate the input.

@nbubna, @tomByrer Interesting discussion on the browser support. Personally, I don't think <IE9 is critical to a project but should definitely something to work to. Nathan's HTML fork is absolutely awesome. The each function is really intuitive and incredibly handy. Definitely surpasses Voyeur in functionality.

In relation to the status of the project; ES6 has an upcoming feature called Proxies which adds, from what I gathered, the ability to define wildcard getters on an object (think Object.defineProperty(obj, "*", {...})) which is exactly what Voyeur needs. In the meantime, I'll lobby for the performance improvement in the various mailing lists for Object.defineProperty and hopefully something good will come out of it. It's great to see the interest and discussion on Voyeur. A future could be in store for it.

jkroso commented 11 years ago

is it not possible to extend the prototypes with getters. that way you would only doing it once so defineProperty can be as slow as it likes. e.g.

Object.defineProperty(Node.prototype, 'a', {
  get: function(){
    return filter(this.children, function(child){
      return child.tagName === 'A'
    })
  }
})

actually come to think of it you might not even need to extend any natives since you can create a kind of proxy similar to a jquery object but with getters/setters instead of methods.

nbubna commented 11 years ago

It certainly is possible, but that's a pretty radical change from a stepping lightly to aggressively populating the entire DOM with a whole slew of elements. It would also fail to account for any custom elements in the DOM, though admittedly it is becoming the convention for custom elements always include '-' in their tagName, which dulls the usefulness of the dot-traversal syntax.

I have considered doing this in HTML(.js) nonetheless, but i currently prefer the flexibility of the "step lightly" approach and have demonstrated (see the jsperf above) that the performance of the technique can be improved to the point that it is not a hindrance to any but the most strenuous (and unlikely) DOM traversal cases.

Creating a proxy will have to wait until Proxy becomes a widely available standard. Otherwise, you end up trying to replicate/rewrite the entire DOM interface in your wrapper library, as jQuery and friends have pretty much already done. Even then, i suspect the cost of proxying each node handled may prove higher even than these getters.

jkroso commented 11 years ago

cool I just wanted to make sure it had been considered. Regarding that performance test though I don't think it shows HTML's worst case very well at all.

Creating a proxy will have to wait until Proxy becomes a widely available standard

I can't imagine the new proxy feature will actually make it that much easier to create a proxy though?

i suspect the cost of proxying each node handled may prove higher even than these getters.

perhaps your thinking of using proxies some other way. heres an example of how you might convert jQuery to support dot syntax.

nodeTypes.forEach(function(type){
  Object.defineProperty($.prototype, type.toLowerCase(), {
    get: function(){
      return $(filter(this.children(), function(child){
        return child.tagName === type
      }))
    }
  })
})

to clarify nodeTypes is just a list of all HTML tags, it can be scraped of the global object I think.

nbubna commented 11 years ago

What do you see as HTML's worst case scenario for performance?

Proxy doesn't just make it easier to create proxy objects; it makes it possible. Object.defineProperty can only provide a sort of proxy-ish behavior via getter/setter support, not actual object proxying. It can only give access to known properties; Proxy provides the required wild-card support to avoid having to read and replicate the proxied object's properties. While you can certainly convert jQuery to support a dot-traversal syntax, you will still have only a wrapper with a completely alternate DOM interface, not an enhanced proxy to the actual DOM.

To me, the main attraction of Voyeur (and thus HTML.js) is making the DOM friendly to use with just the smallest amount of sugar syntax. No wasted bytes on wrapper interfaces and aliases. No more learning one API to do the basics (jQuery) and another to do advanced stuff (native DOM) or muck about in jQuery's internals. The dot-traversal is a great eye-catcher and very handy in small doses, but it's not what captured my imagination. :)

That said, a lot of people like the dot-traversal syntax. You should consider building a jQuery plugin to do it. :) Your example code is a fair start.

jkroso commented 11 years ago

What do you see as HTML's worst case scenario for performance?

if the nodes you select with find had children it would extend those right? the nodes your finding in the benchmark don't have any children at all.

tomByrer commented 11 years ago

That said, a lot of people like the dot-traversal syntax. You should consider building a jQuery plugin to do it. :) Your example code is a fair start.

I second that suggestion!

jkroso commented 11 years ago

haha that was the first jQuery I'd written in around a year. I haven't had much calling for this type of high level dom manipulation lately and might be a while before I do again. Its been a while since I've seen API ideas like this though and I want to see them run to their conclusion. So if its not clear how the type of proxy object I'm talking about might work in practice I'll fork component/dom and demonstrate it properly.

nbubna commented 11 years ago

The concept is pretty clear to me and should work well. I don't think you can get the list of elements from the global object, however, but you can swipe the one that Adrian has in his create code. I just think it's a pretty easy plugin to make that should be well received.

As for the jsperf, no actually, it doesn't extend all of the children, only the returned array itself. The local version on my machine now does a partial extend on the first child to proxy those tag names to the array returned by find(). I'll be sure to run the jsperf with that (and child elements) to see if the extra dot-sugar it affords is worth it.

jkroso commented 11 years ago

@nbubna your always going to be trading of performance for features. The advantage of the proxy approach is that you just take one small hit up front and from there on out can just go nuts trying different features. (JQuery really did go nuts though. haha don't do that)

nbubna commented 11 years ago

@jkroso Yeah, it's always a balancing act. But performance isn't the only thing you can sacrifice for features. You can often get both by taking a hit on code complexity or simply sacrificing less valuable features (IE8 support). :)

tomByrer commented 11 years ago

proxy approach... take one small hit up front

@jkroso I think your idea is good, however what about dynamic DOMs? I'm thinking about Facebook/Twitter-like 'scroll for more', single-page applications (SPA), etc... I'm not saying do not try, just be ready for that issue. Could be beneficial to have both approaches ready, depending on project needs.

jkroso commented 11 years ago

@tomByrer I'm not sure exactly what problem you mean but I think its unlikely to be a problem specific to proxies. What I'm thinking of works just like jQuery only with a nicer getter/setter API. So the upfront performance hit I'm talking about is creating the proxy object. Then each time you access child elements etc it would create another proxy object, just like jQuery.

tomByrer commented 11 years ago

Sorry I didn't explain well @jkroso. Perhaps I misunderstood you'll make a proxy as the DOM or script is loaded first, not upon the first access. Although perhaps the problem is still there either way; as new elements are added to the DOM tree, would not the proxy objects have to be re-created for those new elements... or the entire DOM tree would have to be flagged 'impure' then the proxy setting be started all over again? Then wouldn't the up-front savings somewhat nullified when the DOM tree is dynamically changing?

(Sorry I don't mean to hammer you with questions; just trying to wrap my mind around all possibilities... cheers)

nbubna commented 11 years ago

No, we're talking about a jQuery plugin, where the jQuery object is the proxy and your plugin simply defines getters for all possible tag names on the $.fn (the prototype). That's the upfront work; no need to ever do any more work no matter how the DOM changes.

Same story for defining getters for all known tags on HTMLElement.prototype. Do it once, then you are done, no matter how the DOM changes. It's a totally solid technique, IMO. But has drawbacks and tradeoffs (as discussed).

nbubna commented 11 years ago

And just to be clear, in either situation the getters that are defined will lazily (at call-time) filter the child nodes of this for matching elements and return them. Yes, this has a cost, but it can be mitigated by a cache/clear-on-mutate approach like HTML uses. Like i said, i've considered running HTML this way (HTMLElement.prototype, not jQuery.fn), but at this point, i feel the upfront cost saving was not worth the higher invasiveness of extending the prototype. I also like that the "manual extension" approach i use can easily handle custom element types.

tomByrer commented 11 years ago

Thanks for taking the time to explain!

nbubna commented 11 years ago

Version 5 now breaks (stupid AMD in the page). Here's one that works regardless: http://jsperf.com/voyeur8000/11

tomByrer commented 11 years ago

Ty nbubna! I'm wondering if the jQuery dot-notation plugin would be even faster than HTML...