Open awtcode opened 7 years ago
Very interesting, the rust issue appears to be this line: thread 'main' has overflowed its stack
.
I'm 99% sure I know what the issue is (a very deeply nested set of asm.js statements means that the naive freeing mechanism overflows the stack when attempting to recurse and free everything) so I'll fix that and get back to you.
Thanks, do you know if there is a quick way for me to identify the particular set of asm.js statements that is causing the problem? The reason why I would like to do this is because I see the 'callstack size exceeded' problem on Chrome and I would like to know the code path that triggers this problem.
Interesting! The ayzim problem will be because of an incredibly deeply nested expression/set of conditionals in a function, and I assumed that a 'callstack size exceeded' would be due to recursion. But I've just checked, and a deeply nested expression in a single function with no recursion can also cause it (code example at bottom) so they probably are the same problem!
I think omitting the eliminate pass would help (comment out JSOptimizer.queue += [get_eliminate()]
in emcc.py - the EMCC_DEBUG=1
line starting with DEBUG:root:applying js optimization passes:
should no longer contain eliminate
) as that's the thing that creates incredibly deeply nested expressions. Unfortunately it'll make your code quite a bit bigger.
I'll have a think about a way to identify a deep nesting. I think it should be relatively easy to add a small trace in ayzim to give you some information once I've fixed the recursive destructor.
Generate nonrecursive js to cause callstack size exceeded:
n = 6000
print 'function xx(x) {'
for i in range(n):
print 'if (x === 6000) { return x } else { x = x*2;'
for i in range(n):
print '}'
print 'return false; }'
print 'console.log(xx(4))'
I'm curious, why is deep nesting worse here (rust/ayzim vs c++/old optimizer)?
The old optimizer doesn't actually free memory, it just leaks it. So given something like x+y
which becomes something like BinaryOp("+", Name("x"), Name("y"))
, the old optimizer will do nothing, whereas ayzim will clear up three AST nodes, recursing to get down to the Name
nodes.
In general there isn't huge amounts of freeing...except at program exit, when everything must go! So I could probably fix this the lazy way by adding a process exit just before the end of main.
But the old optimizer will still recurse those nodes, not to free them, but to do all the other stuff it does? Why is freeing special here?
There's very little recursion in the old optimizer, courtesy of the traverse*
functions in simpleast which keeps function node traversal to a fixed shallow depth. The one place that I can think of recursion actually happening when traversing nodes is traverseInOrder
, a function inside the eliminate pass which calls itself. However, because it's executed early on inside the eliminate pass (for information gathering), it runs before eliminate has created the extremely deeply nested expressions so gets away with it.
Oh right, thanks, I forgot we de-recursed traversal in C++. I had the earlier JS version in mind which is recursive.
Ok, I found a couple more places that recurse (parsing and emitting output) both in ayzim and the emscripten optimizer. But it appears the emscripten optimizer has a workaround for stack size problems on Windows!
@awtcode I'd like to get to the bottom of this and waste as little of your time as possible, so I've built a new binary: https://aidanhs.com/ayzim-opt-awtcode.exe
EMCC_CORES=1
(to avoid interleaved output) and paste the last 20 lines or so of the debugging output - this will hopefully tell me what operation crashedaidanhs@a-3 MINGW64 ~/Desktop/rust/ayzim (aphs-stack-debug)
$ ./target/release/ayzim-opt.exe ../../sqlite.js asm eliminate >/dev/null
starting reading and parsing
reading and parsing took 641 milliseconds
starting asm
asm took 0 milliseconds
starting eliminate
EL stages: a:292196 fe:442499 vc:38026 se:202936 (ss:465066) cv:370165 r:292639
eliminate took 2342 milliseconds
==== DEPTH 69 FOR _sqlite3VdbeHalt
starting final emit
final emit took 372 milliseconds
The line saying ==== DEPTH 69 FOR _sqlite3VdbeHalt
reports that the _sqlite3VdbeHalt
function was the 'deepest' function after optimization, with a depth of 69. You'll see 124 printouts like these (one per chunk) - I struggle to imagine anything above 1000, but my experiments suggest that something was blowing past 1500 to cause a stack overflow the originally released ayzim! (the new one can handle over 10x that it appears)
If anything jumps out, you can use DEAD_FUNCTIONS to kill suspiciously large functions and see if that helps Chrome at least parse your JS.
If you get a chance at some point to take a look at any of the above I'd be very interested to hear back. It'd be easy to extend the reporting to be on every function (rather than one per chunk, which I just did to stop it being overwhelming), so let me know if you have any thoughts on something that you might find helpful.
Thanks @aidanhs , will try it out and let you know the results asap :)
Hi, I reran my code with the new binary and this is what I saw:
starting reading and parsing
thread 'worker' panicked at 'called Result::unwrap()
on an Err
value: Error { repr: xxx(xxx { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }) }', C:\bot\slave\nightly-dist-rustc-win-msvc-64\build\src\libcore\result.rs:837
note: Run with RUST_BACKTRACE=1
for a backtrace.
Traceback (most recent call last):
This seems to be a new error though.
Hmm, that's unexpected. Your file clearly has something non-utf8 in it. I didn't think non-ascii asm.js would be generated, let alone non-utf8, so clearly I need to examine these assumptions. It could be something withing the generated asm.js itself, or something to do with encoding on windows. Do you have anything non-ascii in your program at all? Function names, variables, strings? Alternatively, do you use non-default-US encoding on Windows?
As a compounding bit of confusion, it's actually not even getting as far as it was last time! The place this error comes from is before anywhere the stack could have possibly overflowed, so either something changed on the machine ayzim is being run on, me rebuilding ayzim on a different machine had adverse effects, or the Rust updates and dependency updates had adverse effects.
Just to double check I'm correct in my assumption in where this is coming from, could you run with RUST_BACKTRACE=1
as an environment variable if you get a chance?
Hi,
I tried using the optimizer and saw a crash in Emscripten 1.36.3. Here's part of the error log:
thread 'main' has overflowed its stack DEBUG:root:EMCC_WASM_BACKEND tells us to use asm.js backend
splitting up js optimization into 124 chunks, using 40 cores (total: 301.94 MB)
Traceback (most recent call last):
File "E:\emsdk\emscripten\1.36.3\em++", line 13, in
File "E:\emsdk\emscripten\1.36.3\emcc.py", line 1731, in run
File "E:\emsdk\emscripten\1.36.3\emcc.py", line 1643, in flush
File "E:\emsdk\emscripten\1.36.3\emcc.py", line 1613, in run_passes
File "E:\emsdk\emscripten\1.36.3\tools\shared.py", line 1741, in js_optimizer
File "E:\emsdk\emscripten\1.36.3\tools\js_optimizer.py", line 544, in run
File "E:\emsdk\emscripten\1.36.3\tools\tempfiles.py", line 64, in run_and_clean
File "E:\emsdk\emscripten\1.36.3\tools\js_optimizer.py", line 544, in
File "E:\emsdk\emscripten\1.36.3\tools\js_optimizer.py", line 446, in run_on_js
File "E:\emsdk\python\2.7.5.3_64bit\lib\multiprocessing\pool.py", line 250, in map
File "E:\emsdk\python\2.7.5.3_64bit\lib\multiprocessing\pool.py", line 557, in get
AssertionError: Error in optimizer (return code 255):
ninja: build stopped: subcommand failed.