Kong / ngx_wasm_module

Nginx + WebAssembly
Apache License 2.0
79 stars 7 forks source link

feat(*) Lua bridge refactor and dispatch fixes #546

Closed thibaultcha closed 3 months ago

thibaultcha commented 4 months ago

Major refactor of the Lua bridge to support multiple concurrent yielding Lua threads, and refactor of dispatch calls failures to continue executing the request. Replaces #539, #545, #523.

The new implementation "tricks" OpenResty by scheduling uthreads via C and passing these threads to the OpenResty runloop as if they were created from Lua (via ngx.thread). Because all uthreads must resume their "parent thread" when finished (as per OpenResty's implementation), we schedule a stub "entry thread" whenever we are trying to use the Lua bridge. This entry thread itself does nothing and is collected at request pool cleanup.

List of significant changes for this refactor:

Fix #524 Fix #528

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 88.99614% with 57 lines in your changes missing coverage. Please review.

Project coverage is 90.36951%. Comparing base (277fac6) to head (b9bb644).

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546/graphs/tree.svg?width=650&height=150&src=pr&token=T0PT2Q9IAN&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong)](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong) ```diff @@ Coverage Diff @@ ## main #546 +/- ## =================================================== + Coverage 90.05848% 90.36951% +0.31103% =================================================== Files 47 47 Lines 10089 10311 +222 =================================================== + Hits 9086 9318 +232 + Misses 1003 993 -10 ``` | [Files](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong) | Coverage Δ | | |---|---|---| | [src/common/lua/ngx\_wasm\_lua\_ffi.c](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&filepath=src%2Fcommon%2Flua%2Fngx_wasm_lua_ffi.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#diff-c3JjL2NvbW1vbi9sdWEvbmd4X3dhc21fbHVhX2ZmaS5j) | `92.92035% <100.00000%> (-0.67965%)` | :arrow_down: | | [src/common/ngx\_wasm\_socket\_tcp\_readers.c](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&filepath=src%2Fcommon%2Fngx_wasm_socket_tcp_readers.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#diff-c3JjL2NvbW1vbi9uZ3hfd2FzbV9zb2NrZXRfdGNwX3JlYWRlcnMuYw==) | `87.81513% <100.00000%> (ø)` | | | [src/common/ngx\_wasm\_subsystem.c](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&filepath=src%2Fcommon%2Fngx_wasm_subsystem.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#diff-c3JjL2NvbW1vbi9uZ3hfd2FzbV9zdWJzeXN0ZW0uYw==) | `100.00000% <100.00000%> (ø)` | | | [src/common/proxy\_wasm/ngx\_proxy\_wasm.c](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&filepath=src%2Fcommon%2Fproxy_wasm%2Fngx_proxy_wasm.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#diff-c3JjL2NvbW1vbi9wcm94eV93YXNtL25neF9wcm94eV93YXNtLmM=) | `92.58850% <100.00000%> (+0.04952%)` | :arrow_up: | | [src/common/proxy\_wasm/ngx\_proxy\_wasm.h](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&filepath=src%2Fcommon%2Fproxy_wasm%2Fngx_proxy_wasm.h&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#diff-c3JjL2NvbW1vbi9wcm94eV93YXNtL25neF9wcm94eV93YXNtLmg=) | `91.89189% <ø> (ø)` | | | [src/common/proxy\_wasm/ngx\_proxy\_wasm\_host.c](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&filepath=src%2Fcommon%2Fproxy_wasm%2Fngx_proxy_wasm_host.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#diff-c3JjL2NvbW1vbi9wcm94eV93YXNtL25neF9wcm94eV93YXNtX2hvc3QuYw==) | `94.37751% <100.00000%> (-0.02250%)` | :arrow_down: | | [src/common/proxy\_wasm/ngx\_proxy\_wasm\_util.c](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&filepath=src%2Fcommon%2Fproxy_wasm%2Fngx_proxy_wasm_util.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#diff-c3JjL2NvbW1vbi9wcm94eV93YXNtL25neF9wcm94eV93YXNtX3V0aWwuYw==) | `94.14062% <ø> (ø)` | | | [src/http/ngx\_http\_wasm\_util.c](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&filepath=src%2Fhttp%2Fngx_http_wasm_util.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#diff-c3JjL2h0dHAvbmd4X2h0dHBfd2FzbV91dGlsLmM=) | `87.06199% <100.00000%> (+0.07011%)` | :arrow_up: | | [src/http/proxy\_wasm/ngx\_http\_proxy\_wasm.c](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&filepath=src%2Fhttp%2Fproxy_wasm%2Fngx_http_proxy_wasm.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#diff-c3JjL2h0dHAvcHJveHlfd2FzbS9uZ3hfaHR0cF9wcm94eV93YXNtLmM=) | `93.71728% <100.00000%> (+0.20377%)` | :arrow_up: | | [src/http/proxy\_wasm/ngx\_http\_proxy\_wasm\_dispatch.c](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree&filepath=src%2Fhttp%2Fproxy_wasm%2Fngx_http_proxy_wasm_dispatch.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#diff-c3JjL2h0dHAvcHJveHlfd2FzbS9uZ3hfaHR0cF9wcm94eV93YXNtX2Rpc3BhdGNoLmM=) | `91.62562% <100.00000%> (+0.39754%)` | :arrow_up: | | ... and [8 more](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong) | | | [Flag](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong) | Coverage Δ | | |---|---|---| | [unit](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong) | `90.13820% <88.56502%> (-0.01297%)` | :arrow_down: | | [valgrind](https://app.codecov.io/gh/Kong/ngx_wasm_module/pull/546/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong) | `81.21406% <85.62092%> (+3.19863%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Kong#carryforward-flags-in-the-pull-request-comment) to find out more.
hishamhm commented 3 months ago

@hishamhm I would like for the datakit filter to be tested with https://github.com/Kong/ngx_wasm_module/pull/546 before merging it

@thibaultcha I've just tested it and it's still crashing. Running it enough times I do get segfaults, but every run is reporting memory access errors consistently.

These are the logs from a Gateway run with Valgrind. This particular run didn't crash, but Valgrind reported reads on data that was freed by ngx_proxy_wasm_dispatch_calls_cancel: pr-546-kong-error.log pr-546-kong-valgrind.log

This was running a datakit configuration with 2 parallel calls that fail (I didn't spin up the other localhost upstream).

thibaultcha commented 3 months ago

@hishamhm What is the specific reproducible example for this? It does not trigger in the test given with this PR, and seems to trigger when send_local_response is used at some point (as per the Valgrind report). I tried updating that test with:

    location /t {
        proxy_wasm hostcalls 'on=request_body \
                              test=/t/dispatch_http_call \
                              host=127.0.0.1:1 \
                              ncalls=2';
        proxy_wasm hostcalls 'on=request_body \
                              test=/t/echo/body';
        echo ok;
    }

But it does not trigger any memory issues.

hishamhm commented 3 months ago

@thibaultcha Sure, let me try to get the same filter crash without the gateway.

hishamhm commented 3 months ago

Took me a while but I got a consistent repro running without the gateway: https://github.com/Kong/ngx_wasm_module/actions/runs/9133572211/job/25117356096

This branch/commit was intended only for sharing the test case: https://github.com/Kong/ngx_wasm_module/commit/495c13dcd2aaf5273d9e75a121cfec4d8c7793bd (Loading Datakit on Wasmtime with Valgrind takes a couple of minutes, so I enabled the cache in the testcase as a quality-of-life tweak for speeding up debugging runs...)

I think we can eventually isolate the offending proxy-wasm calls and produce a hostcalls-based testcase, but to go one step at a time I wanted to get the error reproduced without the Gateway on Datakit first.

Looking at the logs, I think the triggering condition is to dispatch a call, then in the same handler (in this case, on_request_headers), trigger a local response, before the dispatch callback gets a chance to run.

hishamhm commented 3 months ago

I think I got a minimized test case. Branch tests/interrupted-dispatch on top of this PR — this test: https://github.com/Kong/ngx_wasm_module/commit/6cb6899c75b70370fac89f5dc47e246ef5c19ab7

Still waiting for the CI outcome here, but it produces the Valgrind reports locally.

thibaultcha commented 3 months ago

@hishamhm Thanks for the test! This turned out to be yet another thing!

thibaultcha commented 3 months ago

@hishamhm Would you give it another try with the current state of this branch? Except for a small failure in dynamic builds I'm still investigating it all runs green, so I hope it fixes everything now.

thibaultcha commented 3 months ago

Ok I fixed the last problem but @flrgh found another problem in the Kong PR which I also am looking at now.

thibaultcha commented 3 months ago

@flrgh Ok, the latest state of this branch should also take care of the Gateway issue. I have the spec/02-integration/20-wasm/ suite passing locally.

flrgh commented 3 months ago

@flrgh Ok, the latest state of this branch should also take care of the Gateway issue. I have the spec/02-integration/20-wasm/ suite passing locally.

Nice! Looking good with 75e101780f09ade9a4bada0aa8340c0f6eb74f97 at my end too.

hishamhm commented 3 months ago

I'll give this branch a spin later today!

thibaultcha commented 3 months ago

I have found one more bug that I am trying to get rid of before merging this.

thibaultcha commented 3 months ago

@flrgh @hishamhm Merged! I think in time for the Gateway patch release, hopefully.