VM's reliance on closures makes it 3× slower

eevee commented 9 years ago

V8, and most likely other JavaScript VMs, won't optimize a function if it relies on reading variables from outer scopes. Virtually everything in the Quixe VM works this way. (Chrome's profiler will tell you when a function is unoptimized, and why.)

I have a travesty of a branch where I shoved most of the state into a vm object that gets passed around, and turned everything into a method. I tried loading Emily Short's Counterfeit Monkey, which does a lot of intensive work at startup. Load time dropped from 29.4 seconds to 10.4 seconds, as measured by the "done executing; path time = ..." console log.

erkyrath commented 9 years ago

Well that's good to know. I mean, not good, but important.

(I never did learn the JS scoping rules, because... do I need to finish that sentence?)

erkyrath commented 9 years ago

Fussing with a "noeval" branch now. It's not done.

erkyrath commented 9 years ago

Currently using this pattern: http://gameshelf.jmac.org/2015/05/javascript-wonkery/

Other patterns to test for speed:

new Function("self", ...) // and then invoke with func(self)
eval("(function(self) {...})") // and then invoke with func(self)
(0, eval)("(function(self) {...})") // http://perfectionkills.com/global-eval-what-are-the-options/

eevee commented 9 years ago

The big win I got was just eliminating the dynamic lookups. The use of eval does slow down the function that actually calls eval, but I don't think it affects the newly-created function. I actually still have it as eval; the only difference is that I changed it to give each function a name:

-    eval('function _func(callargs) {\n' + val + '\n}');                              
-    return _func;                                                                    
+    return eval('(function _glkote__' + func.name + '(callargs) {\n' + val + '\n})');

You're doing pretty much what I did, though, except I had the VM mega-object passed around as an explicit argument rather than as this. Not sure if that makes a difference.

I also got a decent win from splitting the try/catch block out of execute_loop; apparently that disables optimization too, and execute_loop does a lot of other work.

erkyrath commented 9 years ago

I've already made those two changes (giving the function a name, and changing execute_loop to not have try/catch). Haven't tested the performance yet, though.

curiousdannii commented 9 years ago

new Function will be at least as fast as eval. The difference is that eval keeps the scope it was created in and new Function does not, but if you never request a variable lookup I don't know if that would matter. (Also, using new is probably not needed I think.)

I don't know the performance of bind, but if it has to be shimmed then it will be slow of course. Passing in the scope object directly seems simplest to me.

erkyrath commented 9 years ago

Good news, good news, and bad news.

The noeval branch works. Under Safari, it runs about 60% faster. Under Chrome, it runs about 30% faster. Under Firefox, it runs about 40% slower.

(I am comparing the 2.1.0 release with noeval, on Counterfeit Monkey, comparing CPU-intensive moves like startup and look.)

eevee commented 9 years ago

Hm, I can't even load it:

Quixe init: RangeError: Maximum call stack size exceeded Maximum call stack size exceeded RangeError

eevee commented 9 years ago

Ah, there's a limit to how much you can jam into apply. I had a workaround for this in my graphics branch, but it didn't make it in. I'll file a PR shortly.

erkyrath commented 9 years ago

Oh, I remember seeing that error... on my old laptop, never on my desktop machine. The browser must configure its JS engine differently based on something. Thanks.

eevee commented 9 years ago

Filed pull #8.

I was really curious why my results were so different from yours (70% faster in Chrome, roughly the same in Firefox) so I went through my original mess of a branch a commit at a time, loading Counterfeit Monkey one time each:

32.014s — tip of my "modernize" branch
32.475s — use ArrayBuffer
19.373s — introduction of VM object, only a handful of functions on its prototype (of particular interest is that Firefox is still 16.137s here)
19.166s — move execute_loop and pop_callstub onto VM's prototype
16.784s — remove tempsearchkey (the second commit in the pull)
16.284s — change use of eval to give each generated function a distinct name
15.508s — make most of the gi_dispa stuff go through the vm object
16.622s — remove tempcallargs, instead just passing an array as the second argument to enter_function (apparently this one didn't go as well as I'd hoped)
11.327s — move memmap, pc, leave_function, store_operand, and some other common things onto VM

I have vague suspicions that JS VMs are just heavily biased towards traditional prototypes and uses of this, but I don't have a more specific explanation yet.

erkyrath commented 9 years ago

I checked in one more change (in noeval) which uses an explicit "self" argument instead of the .bind(self) call.

This seems to be the crucial bit. I now get 65% improvement in Safari, 55% in Chrome, 10% in Firefox. (Again comparing with the 2.1.0 release.) So we're all in the black now.

eevee commented 9 years ago

Nice! I think that'll put Counterfeit Monkey under the magical threshold separating "that took a long time" from "this is taking forever" :)

sukima commented 9 years ago

@eevee Using prototypes is more performant. According to this blog:

Create objects using a constructor function. This ensures that all objects created with it have the same hidden class and helps avoid changing these classes. As an added benefit, it’s also slightly faster than Object.create()

The following posts I found explained how JS is optimized under the hood:

erkyrath commented 9 years ago

I don't think those references are relevant. Quixe only constructs a few types of object; they are all simple, have fixed fields, lack function fields, and are built with a constructor function. They're not what we're talking about here in any case.

(The toptal.com reference is relevant, but it just explains the behavior of prototypes, which is the background to what we've been discussing.)

erkyrath / quixe

VM's reliance on closures makes it 3× slower #6