Kong / ngx_wasm_module

Nginx + WebAssembly
Apache License 2.0
80 stars 7 forks source link

segfault during ngx_proxy_wasm_ctx_lookup #453

Closed hishamhm closed 9 months ago

hishamhm commented 9 months ago

Running kong/kong:nightly container (SHA kong/kong@sha256:efaa8af02211956a69fc180f6cffa1fcf64f45a44297f21c192a077087030fb9 ) with the coraza-proxy-wasm filter, I am getting a consistent segfault by triggering two proxy requests: the first one fails with cannot pause in "body_filter" phase ("on_response_body" step), the second one segfaults.

I haven't tried it with the latest ngx_wasm_module code yet; I will add more info to this issue as I gather it. No detailed repro steps at the moment since my coraza-proxy-wasm testing environment is quite tweaked at the moment.

docker-compose logs:

WARNING: Found orphan containers (kong-service-provisioner-1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
Starting kong_httpbin_1 ... done
Recreating kong_kong_1  ... done
Attaching to kong_httpbin_1, kong_kong_1
httpbin_1  | time="2023-11-30T18:10:13.5321" msg="go-httpbin listening on http://0.0.0.0:8080"
kong_1     | 2023/11/30 18:10:14 [warn] 1#0: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /var/run/kong/nginx.conf:8
kong_1     | nginx: [warn] the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /var/run/kong/nginx.conf:8
kong_1     | 2023/11/30 18:10:14 [notice] 1#0: [wasm] swapping modules: "ngx_http_headers_more_filter_module" (index: 61) and "ngx_http_wasm_filter_module" (index: 90)
kong_1     | 2023/11/30 18:10:14 [notice] 1#0: [wasm] swapping modules: "ngx_http_lua_module" (index: 60) and "ngx_wasm_core_module" (index: 88)
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: using the "epoll" event method
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: openresty/1.21.4.2
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: OS: Linux 6.2.0-37-generic
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: getrlimit(RLIMIT_NOFILE): 1048576:1048576
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker processes
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1256
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1257
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1258
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1259
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1260
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1261
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1262
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1263
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1264
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1265
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1266
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1267
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1268
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1269
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1270
kong_1     | 2023/11/30 18:10:20 [notice] 1#0: start worker process 1271
kong_1     | 2023/11/30 18:10:26 [notice] 1263#0: *1 [lua] globalpatches.lua:73: sleep(): executing a blocking 'sleep' (0.001 seconds), context: init_worker_by_lua*
kong_1     | 2023/11/30 18:10:26 [notice] 1263#0: *1 [lua] globalpatches.lua:73: sleep(): executing a blocking 'sleep' (0.002 seconds), context: init_worker_by_lua*
kong_1     | 2023/11/30 18:10:26 [notice] 1265#0: *2 [lua] init.lua:259: purge(): [DB cache] purging (local) cache, context: init_worker_by_lua*
kong_1     | 2023/11/30 18:10:26 [notice] 1265#0: *2 [lua] init.lua:259: purge(): [DB cache] purging (local) cache, context: init_worker_by_lua*
kong_1     | 2023/11/30 18:10:26 [notice] 1265#0: *2 [kong] init.lua:522 declarative config loaded from /opt/kong/kong.yaml, context: init_worker_by_lua*
kong_1     | 2023/11/30 18:10:26 [notice] 1260#0: *5 [lua] globalpatches.lua:73: sleep(): executing a blocking 'sleep' (0.001 seconds), context: init_worker_by_lua*
kong_1     | 2023/11/30 18:14:32 [crit] 1256#0: *2369 [proxy-wasm]["main" #1] /%!(EXTRA T=GET, T=HTTP/1.1), client: 172.18.0.1, server: kong, request: "GET / HTTP/1.1", host: "localhost:8000"
httpbin_1  | time="2023-11-30T18:14:33.5486" status=200 method="GET" uri="/" size_bytes=11133 duration_ms=0.31 user_agent="HTTPie/2.6.0" client_ip=172.18.0.1
kong_1     | 2023/11/30 18:14:33 [error] 1256#0: *2369 [proxy-wasm]["main" #1] filter 1/1 cannot pause in "body_filter" phase ("on_response_body" step) while sending to client, client: 172.18.0.1, server: kong, request: "GET / HTTP/1.1", upstream: "http://172.18.0.2:8080/", host: "localhost:8000"
kong_1     | 2023/11/30 18:14:33 [warn] 1256#0: *2369 [proxy-wasm]["main" #1] filter 1/1 failed resuming "on_response_body" step in "body_filter" phase (not yieldable) while sending to client, client: 172.18.0.1, server: kong, request: "GET / HTTP/1.1", upstream: "http://172.18.0.2:8080/", host: "localhost:8000"
kong_1     | 172.18.0.1 - - [30/Nov/2023:18:14:33 +0000] "GET / HTTP/1.1" 200 11153 "-" "HTTPie/2.6.0"
kong_1     | 2023/11/30 18:14:35 [error] 1269#0: *310 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1259#0: *1332 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1268#0: *756 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1270#0: *1482 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1266#0: *2223 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1262#0: *1791 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1267#0: *456 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1258#0: *748 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1261#0: *1770 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1271#0: *758 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1264#0: *1478 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1257#0: *752 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1260#0: *460 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1263#0: *164 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [error] 1265#0: *18 [lua] worker.lua:266: communicate(): event worker failed: failed to receive the header bytes: closed, context: ngx.timer
kong_1     | 2023/11/30 18:14:35 [notice] 1#0: signal 17 (SIGCHLD) received from 1256
kong_1     | 2023/11/30 18:14:35 [alert] 1#0: worker process 1256 exited on signal 11 (core dumped)
kong_1     | 2023/11/30 18:14:35 [notice] 1#0: start worker process 2004
kong_1     | 2023/11/30 18:14:35 [notice] 1#0: signal 29 (SIGIO) received
kong_1     | 2023/11/30 18:14:35 [warn] 2004#0: found and cleared 1 stale readers from LMDB

Backtrace from within the container:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  ngx_proxy_wasm_ctx_lookup (id=2, ictx=0x56255ad16db8)
    at /home/runner/.cache/bazel/_bazel_runner/5f5cb99aeb7e2b4db19484ea5022e949/execroot/kong/external/ngx_wasm_module/src/common/proxy_wasm/ngx_proxy_wasm.c:95
95  /home/runner/.cache/bazel/_bazel_runner/5f5cb99aeb7e2b4db19484ea5022e949/execroot/kong/external/ngx_wasm_module/src/common/proxy_wasm/ngx_proxy_wasm.c: No such file or directory.
(gdb) bt
#0  ngx_proxy_wasm_ctx_lookup (id=2, ictx=0x56255ad16db8)
    at /home/runner/.cache/bazel/_bazel_runner/5f5cb99aeb7e2b4db19484ea5022e949/execroot/kong/external/ngx_wasm_module/src/common/proxy_wasm/ngx_proxy_wasm.c:95
#1  ngx_proxy_wasm_on_start (ictx=ictx@entry=0x56255ad16db8, filter=filter@entry=0x56255ad16800, start=start@entry=0)
    at /home/runner/.cache/bazel/_bazel_runner/5f5cb99aeb7e2b4db19484ea5022e949/execroot/kong/external/ngx_wasm_module/src/common/proxy_wasm/ngx_proxy_wasm.c:1507
#2  0x00007f09b78d8356 in ngx_proxy_wasm_run_step (pwexec=pwexec@entry=0x56255ad13fe8, 
    ictx=ictx@entry=0x56255ad16db8, step=step@entry=NGX_PROXY_WASM_STEP_REQ_HEADERS)
    at /home/runner/.cache/bazel/_bazel_runner/5f5cb99aeb7e2b4db19484ea5022e949/execroot/kong/external/ngx_wasm_module/src/common/proxy_wasm/ngx_proxy_wasm.c:660
#3  0x00007f09b78d8a86 in ngx_proxy_wasm_resume (pwctx=0x56255b4b7500, phase=<optimized out>, 
    step=NGX_PROXY_WASM_STEP_REQ_HEADERS)
    at /home/runner/.cache/bazel/_bazel_runner/5f5cb99aeb7e2b4db19484ea5022e949/execroot/kong/external/ngx_wasm_module/src/common/proxy_wasm/ngx_proxy_wasm.c:785
#4  0x00007f09b78d035b in ngx_wasm_ops_resume (ctx=0x56255b1a9448, phaseidx=<optimized out>)
    at /home/runner/.cache/bazel/_bazel_runner/5f5cb99aeb7e2b4db19484ea5022e949/execroot/kong/external/ngx_wasm_module/src/wasm/ngx_wasm_ops.c:293
#5  0x00007f09b78e1d02 in ngx_http_wasm_ffi_start (r=0x56255b763020)
    at /home/runner/.cache/bazel/_bazel_runner/5f5cb99aeb7e2b4db19484ea5022e949/execroot/kong/external/ngx_wasm_module/src/common/lua/ngx_wasm_lua_ffi.c:164
#6  0x00007f09b92d1f89 in lj_vm_ffi_call () at buildvm_x86.dasc:2704
#7  0x00007f09b931fbe4 in lj_ccall_func (L=<optimized out>, cd=<optimized out>) at lj_ccall.c:1382
#8  0x00007f09b9337435 in lj_cf_ffi_meta___call (L=0x7f09a1b69290) at lib_ffi.c:230
#9  0x00007f09b92cfb3b in lj_BC_FUNCC () at buildvm_x86.dasc:859
#10 0x0000562558276146 in ngx_http_lua_run_thread (L=L@entry=0x7f09b7597380, r=r@entry=0x56255b763020, 
    ctx=ctx@entry=0x56255b763d78, nrets=<optimized out>, nrets@entry=0)
    at ../ngx_lua-0.10.25/src/ngx_http_lua_util.c:1184
#11 0x000056255827c2bb in ngx_http_lua_access_by_chunk (L=0x7f09b7597380, r=0x56255b763020)
    at ../ngx_lua-0.10.25/src/ngx_http_lua_accessby.c:337
#12 0x00005625581e08d1 in ngx_http_core_access_phase (r=0x56255b763020, ph=0x5625598bf678)
    at src/http/ngx_http_core_module.c:1110
#13 0x00005625581dbfcd in ngx_http_core_run_phases (r=0x56255b763020) at src/http/ngx_http_core_module.c:885
#14 0x00005625581e7917 in ngx_http_process_request_headers (rev=rev@entry=0x56255a8fb230)
    at src/http/ngx_http_request.c:1498
#15 0x00005625581e7d39 in ngx_http_process_request_line (rev=0x56255a8fb230) at src/http/ngx_http_request.c:1165
#16 0x00005625581cdf0f in ngx_epoll_process_events (cycle=<optimized out>, timer=<optimized out>, 
    flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#17 0x00005625581c3d0a in ngx_process_events_and_timers (cycle=cycle@entry=0x56255895bac0)
    at src/event/ngx_event.c:257
#18 0x00005625581cbcfe in ngx_worker_process_cycle (cycle=cycle@entry=0x56255895bac0, data=data@entry=0x0)
    at src/os/unix/ngx_process_cycle.c:793
#19 0x00005625581ca08f in ngx_spawn_process (cycle=cycle@entry=0x56255895bac0, 
    proc=proc@entry=0x5625581cbc80 <ngx_worker_process_cycle>, data=data@entry=0x0, 
    name=name@entry=0x5625582f7d5c "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#20 0x00005625581cb390 in ngx_start_worker_processes (cycle=cycle@entry=0x56255895bac0, n=16, type=type@entry=-3)
    at src/os/unix/ngx_process_cycle.c:382
#21 0x00005625581cc754 in ngx_master_process_cycle (cycle=0x56255895bac0) at src/os/unix/ngx_process_cycle.c:135
#22 0x00005625581a15ee in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:386
thibaultcha commented 9 months ago

It is highly likely this is one of the segfaults fixed by ecd78968469ed5fa40d81a26600964535d3e6b00 or 34c23c604198725fed022fa7c5db023dcee89f2c (if they are in the nightly I actually did not check).