ceifa / wasmoon

A real lua 5.4 VM with JS bindings made with webassembly
MIT License
449 stars 29 forks source link

Fix #54 misbehaving integers #97

Closed tims-bsquare closed 7 months ago

tims-bsquare commented 7 months ago
Sapu94 commented 7 months ago

I was just testing wasmoon out today and hit some unexpected behavior caused by 32-bit ints, so I am excited to see this issue being worked on. Unfortunately, I'm hitting a segfault on your branch (which doesn't occur on main). I'm still working on trying to find a minimal script that reproduces this, but let me know if you have any ideas or suggestions on how to get a more useful stack trace. I have bisected it down to being caused by the LUA_INT_DEFAULT change (and not the thread.ts change or other build.sh changes) at least.

Aborted(segmentation fault)
/home/wasmoon/dist/index.js:1663
     /** @suppress {checkTypes} */ var e = new WebAssembly.RuntimeError(what);
                                           ^
RuntimeError: Aborted(segmentation fault)
    at abort (/home/wasmoon/dist/index.js:1663:44)
    at segfault (/home/wasmoon/dist/index.js:1474:6)
    at glue.wasm.SAFE_HEAP_STORE_i32_4_4 (wasm://wasm/glue.wasm-006c350e:wasm-function[1373]:0xc2972)
    at glue.wasm.emmalloc_memalign (wasm://wasm/glue.wasm-006c350e:wasm-function[1282]:0xbf8cd)
    at glue.wasm.emmalloc_aligned_realloc (wasm://wasm/glue.wasm-006c350e:wasm-function[1286]:0xbfddc)
    at glue.wasm.emmalloc_realloc (wasm://wasm/glue.wasm-006c350e:wasm-function[1288]:0xc0155)
    at glue.wasm.l_alloc (wasm://wasm/glue.wasm-006c350e:wasm-function[185]:0x13d7d)
    at glue.wasm.luaM_malloc_ (wasm://wasm/glue.wasm-006c350e:wasm-function[667]:0x55311)
    at glue.wasm.luaC_newobjdt (wasm://wasm/glue.wasm-006c350e:wasm-function[490]:0x3b43a)
    at glue.wasm.luaC_newobj (wasm://wasm/glue.wasm-006c350e:wasm-function[491]:0x3b5b6)
Sapu94 commented 7 months ago

Turns out that crash is triggered by me wrapping require in a JS function. Feel free to ignore that as it's probably not a recommended thing to do, and without this hook, everything is working perfectly for me, but here's a simplified version of what I was doing for reference (along with some reasonably-complex lua code that does a bunch of require's):

const origRequire = lua.global.get('require');
lua.global.set('require', path => origRequire(path));
tims-bsquare commented 7 months ago

That's a funky error, I'll take a look

tims-bsquare commented 7 months ago

I've tried to write a test to reproduce it but it passes, could you try and tweak it @Sapu94 until it fails please?

    it('wrap require function', async () => {
      const factory = getFactory()
      await factory.mountFile('test.lua', 'return 10')
      const engine = await factory.createEngine()

      const origRequire = engine.global.get('require');
      engine.global.set('require', path => origRequire(path));

      await engine.doString(`require('test')`)
    })
tims-bsquare commented 7 months ago

It could also be related to this https://github.com/emscripten-core/emscripten/issues/13131

Sapu94 commented 7 months ago

Just getting back to this after being away for the past week. It seems like this can only be reproduced by getting the Lua VM / wasm stack into some specific state. Sorry in advance for the huge can of worms this seems to be 😅. Here's as far as I've gotten in coming up with a minimal set of reproduction steps (I'm doing all this in WSL for what it's worth):

  1. Clone your fork into ~/wasmoon and switch to the branch for this PR
  2. Build wasmoon with npm run build:wasm:docker:dev && npm run build
  3. Clone https://github.com/Sapu94/wasmoon-test into ~/wasmoon-test and install dependencies with npm install (the wasmoon dependency is set as file:../wasmoon so it uses the local clone)
  4. Run main.ts: npx ts-node main.ts

Here's the latest error:

$ npx ts-node main.ts
Aborted(segmentation fault)
/home/sapu/wasmoon/dist/index.js:1663
     /** @suppress {checkTypes} */ var e = new WebAssembly.RuntimeError(what);
                                           ^
RuntimeError: Aborted(segmentation fault)
    at abort (/home/sapu/wasmoon/dist/index.js:1663:44)
    at segfault (/home/sapu/wasmoon/dist/index.js:1474:6)
    at glue.wasm.SAFE_HEAP_STORE_i32_1_1 (wasm://wasm/glue.wasm-006bc9de:wasm-function[1358]:0xc15d3)
    at glue.wasm.luaV_execute (wasm://wasm/glue.wasm-006bc9de:wasm-function[1070]:0xaedc4)
    at glue.wasm.ccall (wasm://wasm/glue.wasm-006bc9de:wasm-function[435]:0x33fe4)
    at glue.wasm.luaD_callnoyield (wasm://wasm/glue.wasm-006bc9de:wasm-function[436]:0x340be)
    at glue.wasm.f_call (wasm://wasm/glue.wasm-006bc9de:wasm-function[105]:0xa4b5)
    at invoke_vii (/home/sapu/wasmoon/dist/index.js:6285:31)
    at glue.wasm.luaD_rawrunprotected (wasm://wasm/glue.wasm-006bc9de:wasm-function[416]:0x2fc40)
    at glue.wasm.luaD_pcall (wasm://wasm/glue.wasm-006bc9de:wasm-function[447]:0x35745)

Let me know if you're able to reproduce the issue with these steps or not.

Here are a few of the things I've tried in my test repo that make me the crash go away:

I don't actually think any of the above are super meaningful, other than to point to the issue being caused by things getting into a specific (corrupted) state as opposed to just one faulty function / line of code somewhere.

tims-bsquare commented 7 months ago

That's a great repro repo, thank you for putting that together! :) I think what you encountered was an actual proper segfault where the library was trying to reference a free'd value and I suspect sometimes the memory may not yet have been corrupted depending on those small changes you listed.

I've pushed another change to this branch, unrelated to the misbehaving integers hopefully fixing this for you (it does for me locally).

Something to be aware of is that there's a potential memory leak with doFile/callByteCode that's necessary to avoid segfaults, which was and is done by moving the result value from the execution thread to the global. The code was returning a reference to the value in the thread which had been cleaned up rather than the copy in the global state. Anyway that means if you intend to call doFile regularly with results being returned then it's important to either reset your global state or pop the result from the top of the stack with lua.global.pop()

Sapu94 commented 7 months ago

That fix works great, thanks! Good to know about the memory leak.

ceifa commented 7 months ago

Great job @tims-bsquare! Can I merge it or are you still working on this PR?

tims-bsquare commented 7 months ago

No problem, should be ready to merge @ceifa :) Let me add that TODO first though