emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.63k stars 3.29k forks source link

entire source of module printed when nodejs exception thrown #17228

Closed dvtate closed 1 year ago

dvtate commented 2 years ago

This has been an issue that I've experienced with various packages built with emscripten since 2019 and honestly surprised it's still there. It's one of the biggest painpoints of working with emscripten + node.js for me personally

import binaryen from 'binaryen';

let s = "testing";

// Desparately trying to debug to figure out what problem is
// But this message gets burried in undesired garbage
console.log("s = ", s); 

// Fatal error which makes emscripten print massive unreadable textwall for no discernable reason
// filling console and making logs unreadable
s(444);

running the above code via node index.js results in the following message (which takes several seconds to print) and appears to be generated by emscripten image

I've experienced this exact problem with every emscripten package I've used including internal ones at previous company so I don't think this is the fault of the package maintainers.

kripken commented 2 years ago

Trying to reproduce this, the following works:

I have no idea why node would print the contents of an imported package like that... the error isn't even in the same file! The stack trace even shows that. How bizarre... removing the module import line, the behavior is as expected. Just importing that module, not even using it, is enough to make node print it on an unrelated exception.

I'm not sure how to even debug that. Advice from any node experts reading this would be very welcome...

kripken commented 2 years ago

I can't reproduce this when generating the imported module. For example,

emcc tests/hello_world.c -s EXPORT_ES6 -o index.js -O3 --closure 1 -s SINGLE_FILE -s MODULARIZE

That emits an index.js that, when imported here, does not cause the bug. I tried lots of variations on those flags (which emit something in the same general shape as the binaryen build from npm).

If so then perhaps this is specific to binaryen.js somehow? cc @dcodeIO

dvtate commented 2 years ago

Did you try calling the default export/making it actually load the wasm? I think it getting interrupted while instantiating the wasm module could be the problem. For whatever reason I'm getting an error with the __dirname variable (maybe Node v18 problem idk). But after hardcoding that to the correct value I was able to reproduce the issue with the hello_world.c test file.

image image

 // required due to bug?
global.__dirname = '/tmp';

// /tmp/lib.mjs was generated via the following:
// emcc tests/hello_world.c -s EXPORT_ES6 -o /tmp/lib.mjs -O3 --closure 1 -s SINGLE_FILE -s MODULARIZE
import test from '/tmp/lib.mjs';

// Load module
console.log(test); // [Function (anonymous)]
test();

// Generate fatal error
let s = 'aaaaaa';
s(111111);

Versions and system info is as follows:

[tate@archbook emscr-bug]$ node --version
v18.3.0
[tate@archbook emscr-bug]$ emcc --version
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.9-git (54675bac246e84ca024c182e477665d21fe2e2f7)
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[tate@archbook emscr-bug]$ uname -a
Linux archbook 5.18.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 09 Jun 2022 16:14:10 +0000 x86_64 GNU/Linux

The problem stemming from loading wasm would make sense as the index.js file in binaryen.js ends with ,LB=qg;var tg=await LB(),rg=tg;export{rg as default}; which suggests it's loading a module in the import declaration. I tested a few other emscripten-compiled pacakages and thankfully most of them didn't suffer from this issue but wabt (only if you call it's default export) and talib-wasm (same way as binaryen; has a simple emcc call in build.sh).

It's honestly quite likely that this is a bug with node where it was loading things concurrently and highlights the line of code where both were at before the main thread crashed... I'm gonna have to disect the output js image

dvtate commented 2 years ago

the deeper I go the weirder this gets... still, I'm pretty sure this is a nodejs problem image

dcodeIO commented 2 years ago

I have also seen this issue when using Binaryen.js specifically. What appears to happen is that recent versions of Node.js try to print the source code location of where the Error was generated, in turn printing the entire minified JS produced by Emscripten - I guess because it's all a single line. As a workaround, I have found that the issue can be mitigated by running Node.js with --enable-source-maps, which instead prints a proper code location instead of a wall of text.

kripken commented 2 years ago

Interesting. Ok, given that, it would be great if we could make as small a testcase as possible and file a node.js issue with it for them to look at. If we can't reduce it, all of binaryen.js might be ok too as the bug seems really obvious.

kleisauke commented 2 years ago

This is probably a duplicate of #11532 and #14270, could you try to add -sNODEJS_CATCH_EXIT=0 -sNODEJS_CATCH_REJECTION=0 to the linker invocation? Note that this requires at least Node v15 to work properly. https://github.com/emscripten-core/emscripten/blob/854adf6f7fdc44e433ac9c13335c2a224635a42d/src/settings.js#L733-L751

matpen commented 1 year ago

I had the same problem using embind 3.1.23 on NodeJs 18.12: the solution suggested above works for me.

sbc100 commented 1 year ago

I confirmed that this is no longer and issue after #18743 since we no longer enable NODEJS_CATCH_EXIT in MODULARIZE mode. This change was released in 3.1.34.

@decodIO, I'm a little confused by the fact that you say this problem was mitigated by passing --enable-source-maps. I think perhaps that is really happening is that passing -gsource-map to emcc at link time disables the minification. So I don't think its the node flag that makes a difference here. For one thing, when we minify the JS library code I don't think we generate a source map for that. IIRC our source map support only applies to the native code, no the JS code. So I think maybe --enable-source-maps is a red herring here and that your output JS was not actually minified at all?

Regardless, I think this issue can now be closed. For folks that don't use -sMODULARZE, you can work around this issue by passing -sNODEJS_CATCH_EXIT=0 explicitly.

dcodeIO commented 1 year ago

I can tell for certain that the Node.js flag makes a difference, but I also don't know why exactly. Perhaps it is merely a side-effect of making Node look for a source map in the first place, then suppressing the longish source output, or it only applies to cases where the application that imports Binaryen utilizes source maps itself. Happy if there is a better way, of course :)

sbc100 commented 1 year ago

I think this can now be closed

lastmjs commented 2 months ago

Hi everyone, I am really struggling with this issue. I am not using emscripten directly, but I am using binaryen.js with tsx (instead of ts-node) and Node.js v20.11.0.

This issue keeps happening and I've forked tsx to try to get to the bottom of it, but it seems like this is a Node.js issue.

In fact I just tested this without tsx, in Node.js directly, and it's still a problem. --enable-source-maps makes no difference. It's extremely simple to reproduce:

Here's my code index.js:


import "binaryen";

throw new Error("Should not see all of the code above");

package.json:

{
  "type": "module",
  "dependencies": {
    "binaryen": "^116.0.0"
  }
}

Run node index.js or node --enable-source-maps index.js and you will see an absolutely huge wall of text before the final error, I've only included the end of the wall of text below:

String("utf-8"):g=null}else typeof window!="undefined"&&typeof window.prompt=="function"?(g=window.prompt("Input: "),g!==null&&(g+=`

Error: Should not see all of the code above
    at file:///home/lastmjs/development/test/test_tsx/index.js:3:7

Node.js v20.11.0
sbc100 commented 2 months ago

Presumably this is because the published version of binaryen.js is minified JS, and we don't produce any kind of source map when we do that minification. Is it normal for folks to provide source maps when then ship minified JS in npm?

I'm not very familiar with npm best practices but perhaps the JavaScript in npm itself should not be minified and it should be up to users/blunders of binaryen.js to minify the JS? Is it normal to ship minified JS in npm?

kripken commented 2 months ago

Hmm, I can't reproduce this when I build locally. But I do see it with the npm version.

So as @sbc100 said above,

I confirmed that this is no longer and issue after https://github.com/emscripten-core/emscripten/pull/18743 since we no longer enable NODEJS_CATCH_EXIT in MODULARIZE mode. This change was released in 3.1.34.

That should have fixed it. And it looks fixed when I build locally in the normal way (as Binaryen does on CI):

$ emcmake cmake -B emcc-build -DCMAKE_BUILD_TYPE=Release -G Ninja
$ ninja -C emcc-build binaryen_wasm

The resulting file is 1.65% smaller than the npm build, and it doesn't have the problem.

To make sure I am comparing them in the most similar way, I copied the build's JS over the index.js of the npm build in its directory (so I am not importing them in a different manner or anything like that).

@dcodeIO is the npm build made with different flags?

CountBleck commented 2 months ago

@kripken the npm version runs esbuild to bundle a small wrapper file that instantiates and re-exports Binaryen using top-level-await.

kripken commented 2 months ago

Thanks @CountBleck, that explains what I was seeing - without that await, my version wasn't actually being run.

Now that I have that working, I do see that some of the binaryen.js code is quoted in the stack trace even in my build. I see actually a lot less code quoted, because the minified npm build has 9 newlines but the raw emcc optimized output has 477, so it is slightly less minified. (I believe that is intentional in Emscripten, we leave newlines as separators where possible instead of ;, for this exact reason, and I guess whatever bundler is used here does not.) But, anyhow, that is a minor issue.

I then confirmed @kleisauke 's suggestion that those two flags -sNODEJS_CATCH_EXIT=0 -sNODEJS_CATCH_REJECTION=0 fix this. As binaryen.js is meant to be used as a library I think that's the right default here, I'll open a PR for that.

@sbc100 you say those flags should not be needed in MODULARIZE mode since some point in time, but we don't seem to use that flag in binaryen.js - I suppose the modularization is done by the bundler or something else, so those flags are needed.

kripken commented 2 months ago

With https://github.com/WebAssembly/binaryen/pull/6764 landed, this should be fixed in the next build of binaryen.js.

CountBleck commented 2 months ago

Hmm, is this correct Node.js behavior though? NODEJS_CATCH_EXIT only causes errors to be rethrown, and that shouldn't cause the printed line to change...

sbc100 commented 2 months ago

Hmm, is this correct Node.js behavior though? NODEJS_CATCH_EXIT only causes errors to be rethrown, and that shouldn't cause the printed line to change...

While the stacktrace does remain intact on rethrow (only you throw something that is subclass of Error though), the line that gets printed before the stacktrace is thrown seems to be the rethrow line, and not the original line:

$ cat test.js
function foo() {
  console.log('foo');
  throw Error("bar");
}

process.on('uncaughtException', (ex) => {
  console.log('rethrowing');
  throw ex
});

foo();
$ node test.js 
foo
rethrowing
/usr/local/google/home/sbc/dev/wasm/emscripten/test.js:8
  throw ex
  ^

Error: bar
    at foo (/usr/local/google/home/sbc/dev/wasm/emscripten/test.js:3:9)
    at Object.<anonymous> (/usr/local/google/home/sbc/dev/wasm/emscripten/test.js:12:1)
    at Module._compile (node:internal/modules/cjs/loader:1364:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1422:10)
    at Module.load (node:internal/modules/cjs/loader:1203:32)
    at Module._load (node:internal/modules/cjs/loader:1019:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:128:12)
    at node:internal/main/run_main_module:28:49

Node.js v18.20.3

Note that line 8 of the file (the rethrow) is printed and not line 3 (the original throw).

If you throw something that isn't an Error subclass all traces of the original throw location seem to be lost in a rethrow.

sbc100 commented 2 months ago

I'm uploaded a PR to disable this catching by default: https://github.com/emscripten-core/emscripten/pull/22257

We could followup by completely removing it I think.

CountBleck commented 2 months ago

@sbc100 yeah, I'm observing the same thing.