Closed diefans closed 1 year ago
It seems I am able to "fix" the big 404 issue by resetting the plugin instance:
diff --git a/kong/runloop/plugin_servers/pb_rpc.lua b/kong/runloop/plugin_servers/pb_rpc.lua
index 38f3aafe7..ae4531cdd 100644
--- a/kong/runloop/plugin_servers/pb_rpc.lua
+++ b/kong/runloop/plugin_servers/pb_rpc.lua
@@ -396,7 +396,7 @@ function Rpc:handle_event(plugin_name, conf, phase)
if not res then
kong.log.err(err)
- if string.match(err:lower(), "no plugin instance") then
+ if string.match(err:lower(), "no plugin instance") or err:lower() == "closed" then
self.reset_instance(plugin_name, conf)
return self:handle_event(plugin_name, conf, phase)
end
But still the reload is not graceful - at least only 3 requests fail due to connection closed
, probably related to those 3 workers affected:
20002:kong_1 | 2022/03/14 16:36:24 [error] 1195#0: *21628 [kong] pb_rpc.lua:397 [bm-authx] closed, client: 192.168.33.1, server: kong, request: "GET /auth/version HTTP/1.1",host: "example.com"
20003:kong_1 | 2022/03/14 16:36:24 [error] 1195#0: *21630 [kong] pb_rpc.lua:397 [bm-authx] closed, client: 192.168.33.1, server: kong, request: "GET /auth/version HTTP/1.1",host: "example.com"
20005:kong_1 | 2022/03/14 16:36:24 [error] 1195#0: *21624 [kong] pb_rpc.lua:397 [bm-authx] closed, client: 192.168.33.1, server: kong, request: "GET /auth/version HTTP/1.1",host: "example.com"
❯ cargo run --release --example goose -- --host https://example.com/auth/version -t 10 --no-reset-metrics
Finished release [optimized] target(s) in 0.14s
Running `target/release/examples/goose --host 'https://example.com/auth/version' -t 10 --no-reset-metrics`
16:36:23 [WARN] "/auth/version": error sending request for url (https://example.com/auth/version): connection closed before message completed
16:36:23 [WARN] "/auth/version": error sending request for url (https://example.com/auth/version): connection closed before message completed
16:36:23 [WARN] "/auth/version": error sending request for url (https://example.com/auth/version): connection closed before message completed
All 8 users hatched.
=== PER TASK METRICS ===
------------------------------------------------------------------------------
Name | # times run | # fails | task/s | fail/s
------------------------------------------------------------------------------
1: LoadtestTasks |
1: | 8 | 0 (0%) | 0.80 | 0.00
2: | 68,858 | 0 (0%) | 6886 | 0.00
-------------------------+---------------+----------------+----------+--------
Aggregated | 68,866 | 0 (0%) | 6887 | 0.00
------------------------------------------------------------------------------
Name | Avg (ms) | Min | Max | Median
------------------------------------------------------------------------------
1: LoadtestTasks |
1: | 11.12 | 7 | 14 | 10
2: | 1.13 | 1 | 105 | 1
-------------------------+-------------+------------+-------------+-----------
Aggregated | 1.13 | 1 | 105 | 1
=== PER REQUEST METRICS ===
------------------------------------------------------------------------------
Name | # reqs | # fails | req/s | fail/s
------------------------------------------------------------------------------
GET /auth/version | 68,858 | 3 (0%) | 6886 | 0.30
------------------------------------------------------------------------------
Name | Avg (ms) | Min | Max | Median
------------------------------------------------------------------------------
GET /auth/version | 1.11 | 1 | 105 | 1
------------------------------------------------------------------------------
Slowest page load within specified percentile of requests (in ms):
------------------------------------------------------------------------------
Name | 50% | 75% | 98% | 99% | 99.9% | 99.99%
------------------------------------------------------------------------------
GET /auth/version | 1 | 1 | 4 | 5 | 11 | 34
=== ERRORS ===
------------------------------------------------------------------------------
Count | Error
------------------------------------------------------------------------------
3 GET /auth/version: error sending request /auth/version: connection closed before message completed
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Users: 8
Target host: https://example.com/auth/version
Starting: 2022-03-14 17:36:17 - 2022-03-14 17:36:25 (duration: 00:00:08)
Running: 2022-03-14 17:36:25 - 2022-03-14 17:36:35 (duration: 00:00:10)
Stopping: 2022-03-14 17:36:35 - 2022-03-14 17:36:35 (duration: 00:00:00)
goose v0.15.2
------------------------------------------------------------------------------
I think we're hitting this on kong version 2.8.0. Slightly different line number:
2022/03/30 01:04:30 [error] 26036#0: *1644465 [kong] pb_rpc.lua:397 [rtoken] closed, client: 172.16.80.250, server: kong, request: "GET ***redacted*** HTTP/1.1", host: "20.89.27.110"
https://github.com/Kong/kong/blob/master/kong/runloop/plugin_servers/pb_rpc.lua#L397
...still testing....
kong reached the following state, after continuous reloading under load, with the above patch
Mem: 30873812K used, 1917048K free, 222636K shrd, 2792924K buff, 11316512K cached
CPU: 5% usr 1% sys 0% nic 93% idle 0% io 0% irq 0% sirq
Load average: 0.65 2.36 3.38 5/2240 1959
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
1477 1 kong S 711m 2% 4 0% /go-plugins/bm-authx -kong-prefix /usr/local/ko
1504 1 kong S 419m 1% 2 0% nginx: worker process
1370 1 kong S 711m 2% 5 0% /go-plugins/bm-authx -kong-prefix /usr/local/ko
1231 1 kong S 711m 2% 2 0% /go-plugins/bm-authx -kong-prefix /usr/local/ko
1421 1 kong S 711m 2% 3 0% /go-plugins/bm-authx -kong-prefix /usr/local/ko
1509 1499 kong S 711m 2% 5 0% /go-plugins/bm-authx -kong-prefix /usr/local/ko
1499 1 kong S 450m 1% 6 0% nginx: worker process
1501 1 kong S 438m 1% 1 0% nginx: worker process
1505 1 kong S 419m 1% 4 0% nginx: worker process
1502 1 kong S 419m 1% 6 0% nginx: worker process
1500 1 kong S 419m 1% 2 0% nginx: worker process
1506 1 kong S 419m 1% 4 0% nginx: worker process
1503 1 kong S 419m 1% 2 0% nginx: worker process
1 0 kong S 381m 1% 1 0% nginx: master process /usr/local/openresty/ngin
1189 0 kong S 2592 0% 5 0% bash
1520 1189 kong R 1600 0% 0 0% top
So there are now a couple of plugin processes hanging around... all but one lost their worker process.
I created https://github.com/bettermarks/kong_8531 to easy reproduce this issue.
Also interesting - JavaScript plugins fail even worse, while Python plugins seem to repair itself and nearly do not mess up.
The 404
from the issue title is a by-product of the plugin being skipped/omitted due to its failure, where the request reached the upstream in a unintended way. So this issue is related also to https://github.com/Kong/kong/issues/8546.
I think we're hitting this on kong version 2.8.0. Slightly different line number:
2022/03/30 01:04:30 [error] 26036#0: *1644465 [kong] pb_rpc.lua:397 [rtoken] closed, client: 172.16.80.250, server: kong, request: "GET ***redacted*** HTTP/1.1", host: "20.89.27.110"
https://github.com/Kong/kong/blob/master/kong/runloop/plugin_servers/pb_rpc.lua#L397
I got the same issue but don't know how to fixed it. I use the latest version of go pdk. How did you solve it?
Is there an existing issue for this?
Kong version (
$ kong version
)2.7.1
Current Behavior
When just reloading kong while the host receives reasonable traffic passed through a go plugin, all responses after reload have a
status==404
rendering the service useless.The error log shows:
Expected Behavior
kong reloads gracefully without notice to a client.
Steps To Reproduce
Anything else?
We believe, this is related to how kong is managing recreating its workers and the go plugin process, e.g. binding new workers to the old plugin process and stopping it the latter.