bytecodealliance / ComponentizeJS

JS -> WebAssembly Component
Apache License 2.0
242 stars 32 forks source link

Repeated calls to an instance start failing #80

Open pvlugter opened 9 months ago

pvlugter commented 9 months ago

We're testing long-lived component instances created with ComponentizeJS. After some number of calls to an instance we start getting failures, wasm traps such as unreachable or uninitialized value. The number of calls before a failure seems to vary based on the functions called and the data returned, but it consistently fails given the same pattern of calls.

I've created a test case using the ComponentizeJS testing here: https://github.com/bytecodealliance/ComponentizeJS/compare/main...pvlugter:ComponentizeJS:repeated-calls

Which fails after 1805 calls with:

RuntimeError: failed on attempt [1806]: null function or function signature mismatch
 at js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (wasm://wasm/029a3d3a:wasm-function[337]:0x193642)
 at js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) (wasm://wasm/029a3d3a:wasm-function[6458]:0x58cc39)
 at JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) (wasm://wasm/029a3d3a:wasm-function[1448]:0x31b2b4)
 at call(unsigned int, void*) (wasm://wasm/029a3d3a:wasm-function[500]:0x1e76dc)
 at exports#hello (wasm://wasm/029a3d3a:wasm-function[14509]:0x67e160)
 at Object.hello (file:///.../ComponentizeJS/test/output/repeated-calls/repeated-calls.js:1430:40)
 at Module.test (file:///.../ComponentizeJS/test/cases/repeated-calls/test.js:6:36)
 at Context.<anonymous> (file:///.../ComponentizeJS/test/test.js:138:18)
guybedford commented 9 months ago

Thanks for sharing a replication here, this seems like an overflow case. Will aim to look into it further soon.

guybedford commented 8 months ago

With the upgrade to the new StarlingMonkey engine, this test is no longer failing. I've still merged it in to ensure there are no future regressions though - https://github.com/bytecodealliance/ComponentizeJS/commit/a9e071dab4168f5f1458d8dc71f41fe5f3c30bc8.

pvlugter commented 8 months ago

Thanks. Tried a local version of this on our larger tests and it does still fail. Trying the simple repeated-calls test here again: it also eventually fails, just many more calls required:

RuntimeError: failed on attempt [172515]: unreachable
 at wasm://wasm/030be13a:wasm-function[8641]:0x73f1a3
 at wasm://wasm/030be13a:wasm-function[5761]:0x6c51cd
 at wasm://wasm/030be13a:wasm-function[622]:0x334e66
 at wasm://wasm/030be13a:wasm-function[4755]:0x67eb46
 at wasm://wasm/030be13a:wasm-function[7539]:0x71c98d
 at wasm://wasm/030be13a:wasm-function[688]:0x35f6d0
 at wasm://wasm/030be13a:wasm-function[1599]:0x4bdb90
 at wasm://wasm/030be13a:wasm-function[212]:0xc4ea0
 at wasm://wasm/030be13a:wasm-function[623]:0x335d87
 at wasm://wasm/030be13a:wasm-function[622]:0x334cc1
 at wasm://wasm/030be13a:wasm-function[4755]:0x67eb46
 at wasm://wasm/030be13a:wasm-function[1729]:0x4de08d
 at wasm://wasm/030be13a:wasm-function[868]:0x3c05cc
 at exports#hello (wasm://wasm/030be13a:wasm-function[12402]:0x775082)
 at Object.hello (file:///.../ComponentizeJS/test/output/repeated-calls/repeated-calls.js:17544:40)
 at Module.test (file:///.../ComponentizeJS/test/cases/repeated-calls/test.js:6:34)
 at Context.<anonymous> (file:///.../ComponentizeJS/test/test.js:182:18)

In our use case, with more complex data structures, it's < 1000 calls. We're running stateless components, so our current workaround is to run the instance until failure and then recreate it, to get reasonable performance.

guybedford commented 8 months ago

Are you sure you're building the local version correctly? I'm publishing a release shortly, perhaps test on that? I did try your repeated-calls test case with 20,000 calls and it still works fine.

pvlugter commented 8 months ago

I think the local version is correctly built. Ran repeated-calls with 200,000. Didn't fail until 172,515.

pvlugter commented 8 months ago

I'll try our own tests with the published release once available.

pvlugter commented 8 months ago

Tested 0.8.0 with one of our own tests. Fails after 1164 calls. For 0.7.1 it was just 92 calls before failure. With the local version I tried, on ebeb262, it was 973 calls.

guybedford commented 8 months ago

Thanks for the report, at least the numbers are getting bigger. Reopening.

guybedford commented 7 months ago

One first step here might be to try and debug if this is a GC issue or a bigger allocation issue in ComponentizeJS. Another useful isolation could be to independently test StarlingMonkey with some GC objects in an exported interface to see if it's definitely happening on the ComponentizeJS side.

Cahu commented 3 months ago

Hi, we're seeing something similar in a plugin system we are developing. The host is in Rust and uses wasmtime v23.0.2.

The host repeatedly call a function from the wasm component to apply a transformation to a stream of byte chunks. It works for a few iterations but crashes after a while.

The error occurs reliably after the same number of calls for a particular input, but that number changes with different inputs.

Our setup has been working fine with other languages for the guest plugin (go, python, rust) so we think we might be facing the issue described here.

The backtrace doesn't seem particularly helpful but I'll provide it anyway:

0: 0x772dbc - <unknown>!<wasm function 8792>
1: 0x714957 - <unknown>!<wasm function 6347>    
2: 0x350f50 - <unknown>!<wasm function 450>    
3: 0x6abb33 - <unknown>!<wasm function 4725>    
4: 0x74cfe9 - <unknown>!<wasm function 7590>    
5: 0x36a6df - <unknown>!<wasm function 490>
6: 0x4dc0d1 - <unknown>!<wasm function 1456>    
7: 0xa86f3 - <unknown>!<wasm function 8>
8: 0x34109d - <unknown>!<wasm function 426>    
9: 0x350e15 - <unknown>!<wasm function 450>    
10: 0x6abb33 - <unknown>!<wasm function 4725>   
11: 0x518f19 - <unknown>!<wasm function 1706>   
12: 0x3c2a0e - <unknown>!<wasm function 652>   
13: 0x764085 - <unknown>!transform