Closed jtiscione closed 6 years ago
Just ran it locally, really cool. Thank you so much, this is a cool demo to have!
It's a good demo of how memory access works with ArrayBuffers and WebAssembly.Memory. (Where the memory is exported- I couldn't figure out how to get importing memory to work. Maybe the docs for WebAssembly.Memory were last updated right before some massive security hole got patched.)
The performance needs a look though- it runs pure JS if you hold down a key when it's running, which you think would slow it down, but doesn't, as you can see in the console output. (The AssemblyScript guys were having problems with performance too.)
Emscripten compiles the version I wrote in C down to WASM which somehow runs 50% faster than JS. This is just a bunch of loops over an array. I made sure they were the same across JS, Walt, and C. I'm squinting at them and trying to figure out how they could go faster. WASM isn't laced with sorcery like x86, so a lot of standard C compiler tricks aren't available to it. It's not using an autovectorizer for example, which would easily seize on this code and be the obvious explanation for a speedup. What is Emscripten doing differently? Is it possible to get that speed from modifying the Walt code?
Can any arbitrary WASM or WAT file be produced by feeding the appropriate source to the Walt compiler? Or is Walt capable of producing only a subset? If the reverse mapping from WAT files to Walt files is one-to-many (and not ever one-to-zero) then it should be possible to hand-optimize it.
Thanks for bringing to my attention. Looking at the loops alone Emscripten has a better way of encoding the iterations over i.
get_local $l10
i32.const -1
i32.add
tee_local $l10
br_if $L3
This bit is just about in every loop. From the looks of it all the for-loops are more of "while-loops". First, it counts backwards, so the condition is always "is this zero" instead of less-than/greater-than, it also uses one less instruction on every loop thanks to the tee_local
. tee sets and returns the local variable where as in the current compiler there is always the extra get_local
.
You could mimic the count down to zero trick with a while loop while(value) { ... ; value -= 1; }
, but the tee_local
is not currently exposed in the syntax, it would need to be an assignment expression (not a statement) so something like this:
let value : i32 = width;
while((value -= 1)) {
// logic
}
Not currently implemented, but trivial-ish.
There are other differences from looking at the wasm output, like the heavy use of selects instead of if/thens. Ternary statements in Walt compile to selects
though so you could mimic that as well.
In general I would advice not calling into JS during step()
at all (like getCanvasWidth() etc) that would give you the most bang for your buck I would bet.
Thanks for bringing this up! There are sneaky improvements to be made here, definitely the tee_local
would be useful, basically assignments in expression. Too bad it's only available for locals though.
This is sort of intriguing... I'll have to try some of these.
I wonder how Emscripten determines whether or not the loop ordering is relevant. First of all it would have to recognize that the arrays (or the portions being iterated over) don't overlap. If the "vel" array partially overlapped the "u" array in the heap, then it couldn't make that assumption. But the arrays aren't even carrying their lengths around. It sounds like a lot of work to avoid emitting one instruction.
C-style languages don't demand you to tell it this stuff. High Performance Fortran wants to see compiler directives above loops and functions, telling it exactly how to distribute the work, or it won't optimize anything. It also has a FORALL statement that tells the compiler whether or not a function or a loop body is "pure". That's the only reason people are still using Fortran anywhere.
Maybe a goofy syntax like for(i in [0...32]) {...}
could be introduced that specifically does not guarantee a loop ordering. But if it were me I'd probably just to tell people to write loops that go backwards.
@jtiscione Hmm, in my tests AssemblyScript version has similar performance with C. Approximately 2-3 ms
per iteration.
AssemblyScript port: https://webassembly.studio/?f=6nwdazfd2le
EDIT wasm sizes: C (3 kB), AssemblyScript (~800 bytes)
The AssemblyScript wrote an article about their struggle to get it to go fast. That article is a year old now but it looks like they were also reverse engineering Emscripten's output.
Actually this not article from core team of AssemblyScript and wrote by Apurva Raman. As I remember he just made one PR for AS. Anyway this article is pretty old =)
If the AssemblyScript has similar performace to C and is smaller, it's probably better to reverse engineer that one instead.
In terms of WAT, AssemblyScript generated 352 lines, and Emscripten generated 445 lines.
By the way, if you want touchscreen support you can add these lines:
// fake mouse events when triggering touch events
canvas.addEventListener("touchstart", function (e) {
e.preventDefault();
var touch = e.touches[0];
var mouseEvent = new MouseEvent("mousedown", {
clientX: touch.clientX,
clientY: touch.clientY
});
canvas.dispatchEvent(mouseEvent);
}, false);
canvas.addEventListener("touchend", function (e) {
e.preventDefault();
var mouseEvent = new MouseEvent("mouseup", {});
canvas.dispatchEvent(mouseEvent);
}, false);
canvas.addEventListener("touchmove", function (e) {
e.preventDefault();
var touch = e.touches[0];
var mouseEvent = new MouseEvent("mousemove", {
clientX: touch.clientX,
clientY: touch.clientY
});
canvas.dispatchEvent(mouseEvent);
}, false);
In case anyone is interested... @ballercat @JobLeonard @MaxGraey I have a version of this code at https://github.com/jtiscione/webassembly-wave that's comparing plain JS, Emscripten, Walt, and AssemblyScript (and has touchscreen support now) showing the frames per second. On this MacBook using Chrome, I'm seeing 115 FPS for JavaScript, 145 FPS for Emscripten-generated WebAssembly, and 100 FPS for Walt-generated WebAssembly. My AssemblyScript version is doing 50 FPS, which can't be right. I'm using code based on the code posted by @MaxGraey at https://webassembly.studio/?f=3v88jbgtnd2, and it's running at half the speed of Walt. The Pointer class used in that version appears to be hampering it: https://github.com/jtiscione/webassembly-wave/blob/master/as/assembly/index.ts. I'm still working on that one since I'm not as familiar with AssemblyScript.
@jtiscione I forked you repo and just change build script to:
"asbuild:optimized": "asc assembly/index.ts -b build/optimized.wasm -t build/optimized.wat --sourceMap --validate --noAssert -O3"
and this change fps from 55 to 140 and in Firefox AS even faster than emscripten):
btw in next release of AS we don't need use builtin select
anymore. Compiler would properly optimize ternary operators to select
by itself
Ah, that must be what WebAssembly Studio is doing. I used the scaffold generated by asinit
and wasn't seeing a difference between untouched and optimized WebAssembly IIRC.
After merging your PR I'm getting 135 FPS from AssemblyScript now. So on Chrome, it's still not as fast as Emscripten (135 vs 145 FPS) but yeah in Firefox it's actually beating Emscripten (165 to 155) which is pretty amazing.
I'd use AS for anything "serious" but I like Walt because it's such a little project and looks easy to fork. Still, its optimizer needs a little attention. How does the optimizer in AS work- does it transform precompiled WASM or is it in the guts of the compiler?
What does "a little project and looks easy to fork" mean?) If you compare installed node_modules
AS approx 7.7 mb and has lesser dependencies. But currently it installing from Github
which always slower than installing from npm
if not using yarn
of course
It's "little" in terms of a feature set. If I wanted to make a special domain-specific language, I'd expect the AssemblyScript project to have a lot of infrastructure related to the full TypeScript specification that wouldn't be relevant for my brainfuck compiler, or whatever it is. (I never look in node_modules- I just watch for strange warnings about deprecated ciphers.)
I added an example in Walt Explorer that shows how array access works. You click and drag with the mouse, and it draws waves across the canvas. It animates with ordinary JS while you're holding down a key, so you see the (small) speed difference. I also fixed issue #127 with the canvas tab possibly not being mounted.