Kong / kong

🦍 The Cloud-Native API Gateway and AI Gateway.
https://konghq.com/install/#kong-community
Apache License 2.0
38.81k stars 4.77k forks source link

Passive health checks not behaving as expected. #13452

Open puneethps opened 1 month ago

puneethps commented 1 month ago

Is there an existing issue for this?

Kong version ($ kong version)

3.2.2

Current Behavior

When I enable passive health checks for an upstream even though the number of failures exceed the configured value the target is not marked as unhealthy and requests are proxied to it, in the logs we see unhealthy TIMEOUT increment (215/1).

We also see worker.lua:164: failed to store event: queue overflow

Expected Behavior

Expected behaviour is to mark the target as unhealthy and requests shouldn't be proxied to that instance

Steps To Reproduce

No response

Anything else?

No response

ProBrian commented 1 month ago

Hi @puneethps, can you provide the configuration of upstreams(the health check related fields, the targets of upstream, etc), and the the steps of how you reproduced this(how you disable target, how the number of failures you got from, etc).

puneethps commented 1 month ago

Hi @ProBrian, The upstream passive health configs are as follows, "passive": { "healthy": { "http_statuses": [ 200, 201, 202, 203, 204, 205, 206, 207, 208, 226, 300, 301, 302, 303, 304, 305, 306, 307, 308 ], "successes": 1 }, "unhealthy": { "http_failures": 1, "http_statuses": [ 400,403,429,404,500,501,502,503,504,505 ], "tcp_failures": 3, "timeouts": 1 } } This is not consistently reproducible, and usually happens under heavy load on the system. My question is the log line which I saw expected

2024/08/06 03:57:31 [warn] 28747#0: *200527150 [lua] healthcheck.lua:1330: log(): [healthcheck] (64ff7302-bd80-4a03-b95e-4dd8e6eab4b7:printing-upgraded.v002) unhealthy TIMEOUT increment (241/1) for '10.160.167.169(10.160.167.169:32768)' while logging request, client: 10.161.195.224, server: kong, request: "GET /printing-upgraded/v1/ping?UNIQUEID=ZrGfEO3a5pA8PofJTjdz@QAAAAU HTTP/1.1", upstream: "http://10.160.167.169:32768/v1/ping

the thing which is concerning is that if I am interpreting the above log line correctly 240 requests were proxied to the target even after it has reached unhealthy timeout threshold.

ProBrian commented 1 month ago

Hi @oowl , Do you have any idea of this issue? Or is there anyone else we can ask for more discussion?

puneethps commented 1 month ago

@oowl Please let us know as soon as you have an update, our production system is impacted because of this.

oowl commented 1 month ago
We also see worker.lua:164: failed to store event: queue overflow

That means some events were missing in some workers, So unhealthy events were not able to notify another worker in time. Can you help me reproduce it in one Kong config? I know what happens, but I am not sure why.

puneethps commented 1 month ago

@oowl This is not easily reproducible in our environment and is an intermittent failure so can't provide definite steps for you guys, and we sometimes see worker.lua:164: failed to store event: queue overflow and sometimes don't, are there any metrics we should be looking at or will changing the log level to debug give us more info?

puneethps commented 1 month ago

@oowl we are also sometimes seeing 2024/08/12 13:21:37 [error] 5315#0: *32717527 [lua] worker.lua:248: communicate(): event worker failed: failed to receive the first 2 bytes: closed, context: ngx.timer is this related?

puneethps commented 3 weeks ago

@oowl when we called curl localhost:8001/upstreams/printing-upgraded.v002/health?balancer_health=1, the response is {"data":{"details":{"hosts":[{"host":"10.160.167.185","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.185","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.165","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.165","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.31","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.31","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.200","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.200","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.237","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.237","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.89","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.89","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.41","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.41","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.211","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.211","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.105","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.105","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.203","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.203","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.21","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.21","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.81","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.81","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.122","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.122","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.195","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.195","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.227","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.227","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.53","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.53","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.38","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.38","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.146","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.146","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.33","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.33","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.73","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.73","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.79","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.79","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.61","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.61","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.98","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.98","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.50","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.50","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.163","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.163","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.153","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.153","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.189","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.189","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.247","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.247","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.120","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.120","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.129","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.129","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.72","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.72","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.24","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.24","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.61","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.61","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.36","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.36","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.206","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.206","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.84","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.84","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.184","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.184","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.75","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.75","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.198","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.198","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.31","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.31","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.63","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.63","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.45","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.45","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.58","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.58","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.253","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.253","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768}],"healthy":true,"weight":{"available":4400,"unavailable":0,"total":4400}},"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3","health":"HEALTHY"},"node_id":"0dbceb58-8f05-496c-9c09-48fa16947ea3","next":null}

and when we call curl localhost:8001/upstreams/printing-upgraded.v002/targets we see {"data":[{"id":"f466b583-e7ff-445e-8dea-083c79ba813d","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724218836.114,"target":"10.160.165.140:32770"},{"id":"0f1393e2-c4a7-4410-8da1-12956db77561","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724191836.659,"target":"10.160.164.105:32769"},{"id":"b7cad805-fc27-4055-afc0-0597f1c0189c","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724191236.387,"target":"10.160.166.35:32769"},{"id":"7706c4c7-42d9-44e0-addc-c38757461c87","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724190738.579,"target":"10.160.165.173:32769"},{"id":"84f9bf1f-7dea-4b77-9984-a0c0d40da96e","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724190216.961,"target":"10.160.164.136:32770"},{"id":"7db70aa4-8eaa-485d-abd1-fa48a3bd38f9","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724190216.785,"target":"10.160.167.80:32769"},{"id":"18c819ff-6158-4610-b817-7db69b932cef","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724189676.335,"target":"10.160.165.19:32770"},{"id":"999d013f-6e3e-4980-87bd-3fac91bb0efa","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724189676.157,"target":"10.160.167.107:32770"},{"id":"a71630cf-abfd-4c3a-a7e3-067628e03af2","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724189016.282,"target":"10.160.166.20:32769"},{"id":"5a7abdea-7a0b-4a5a-84a9-3cd339e6221e","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724188465.105,"target":"10.160.166.137:32770"},{"id":"4e2c1776-8618-42cd-8e5b-0d5953ea1908","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724187877.271,"target":"10.160.166.38:32768"},{"id":"78ce4630-175b-4d66-82e7-bec92c261404","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724187757.106,"target":"10.160.166.207:32769"},{"id":"54d6f98d-3d61-4396-aada-c484ff78967f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724186607.07,"target":"10.160.164.9:32770"},{"id":"035c1cea-ccdf-4b04-8ec1-2f8deee02e9f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724186076.654,"target":"10.160.164.72:32769"},{"id":"c7fcf2c2-4e85-479a-9b0c-eba1e504de57","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724181504.92,"target":"10.160.164.175:32768"},{"id":"92d760cd-a075-47c5-b306-7f1faca246a2","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724177363.394,"target":"10.160.165.5:32768"},{"id":"c776334b-f5d3-4810-a808-67c094560ed0","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724175697.106,"target":"10.160.165.66:32768"},{"id":"e8a977d3-f594-43cf-bace-deb7881c36fe","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724175637.074,"target":"10.160.167.66:32768"},{"id":"430475cc-5556-4356-934c-761b95810a28","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724175145.092,"target":"10.160.166.99:32768"},{"id":"ef1a251e-9026-46ce-a96a-df78dac81169","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724174557.164,"target":"10.160.167.40:32768"},{"id":"3c81a320-3bdd-4ffc-8859-83611edd3fa3","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724174078.544,"target":"10.160.166.30:32768"},{"id":"9b30953e-1ddb-4ec9-a635-57cb3ec66de4","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724173296.626,"target":"10.160.164.128:32768"},{"id":"65335009-67d5-4cc8-804b-2e59ce219e73","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724171660.221,"target":"10.160.164.58:32768"},{"id":"703ad0ae-6e59-4a04-a068-95077f6cd1a8","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724169396.924,"target":"10.160.164.236:32768"},{"id":"a9db7594-7003-4a5f-92b2-c0dc9a7c4381","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724168841.165,"target":"10.160.167.151:32768"},{"id":"758e8f7c-3d0c-409d-bc06-051caf249327","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724166636.474,"target":"10.160.166.138:32768"},{"id":"901b1dea-d8b9-4778-bae5-aa0e03b01b8f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724164177.436,"target":"10.160.167.198:32768"},{"id":"713d2a46-8146-4266-85c7-557343236b87","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724159556.269,"target":"10.160.165.11:32768"},{"id":"7993ce39-15f0-432d-88bb-cd174b36a23f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724155776.632,"target":"10.160.166.156:32769"},{"id":"70293a68-e568-4033-b4d5-0d878b59d4cc","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724105376.921,"target":"10.160.167.168:32769"},{"id":"19f56183-921f-40a3-b024-85d6a818076a","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724100277.11,"target":"10.160.165.184:32768"},{"id":"5b3d42d8-d835-4457-b194-bf4da7ab73c7","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724098416.459,"target":"10.160.165.215:32768"},{"id":"c565ab99-2edc-451b-b9e2-61d9d42ee79e","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724097876.364,"target":"10.160.164.182:32768"},{"id":"8363ed5a-cbdc-4cac-8ad3-f091760772f5","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724088877.176,"target":"10.160.167.215:32769"}] The IP is 10.160.164.33 is found in the balancer but not in the targets, we tried deleting a valid target to check if the balancer is recreated, but it didn't happen.

puneethps commented 3 weeks ago

@bungle any thoughts on this? the balancer should be recreated for any changes to the target and it looks like it isn't happening so requests are being proxied to wrong instances.

oowl commented 3 weeks ago

@oowl we are also sometimes seeing 2024/08/12 13:21:37 [error] 5315#0: *32717527 [lua] worker.lua:248: communicate(): event worker failed: failed to receive the first 2 bytes: closed, context: ngx.timer is this related?

It's related, to which means the consumer of worker event was broken in some ways, but for us, it is still hard to know what I can do for these, Do you guys try editing related code or config to inject your self-defined logic? Please give me the detailed error and config, it will be very helpful to try to understand your problem. Or give us which version started to happen this problem information. If you can help me reproduce the problem in my environment, I can promise we can give a reasonable problem analysis, and try to fix it.

oowl commented 3 weeks ago

@oowl when we called curl localhost:8001/upstreams/printing-upgraded.v002/health?balancer_health=1, the response is {"data":{"details":{"hosts":[{"host":"10.160.167.185","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.185","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.165","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.165","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.31","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.31","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.200","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.200","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.237","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.237","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.89","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.89","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.41","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.41","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.211","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.211","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.105","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.105","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.203","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.203","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.21","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.21","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.81","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.81","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.122","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.122","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.195","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.195","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.227","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.227","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.53","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.53","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.38","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.38","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.146","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.146","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.33","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.33","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.73","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.73","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.79","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.79","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.61","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.61","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.98","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.98","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.50","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.50","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.163","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.163","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.153","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.153","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.189","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.189","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.247","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.247","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.120","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.120","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.129","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.129","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.72","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.72","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.24","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.24","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.61","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.61","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.36","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.36","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.206","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.206","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.84","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.84","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.184","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.184","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.75","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.75","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.198","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.198","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.31","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.31","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.63","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.63","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.45","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.45","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.58","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.58","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.253","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.253","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768}],"healthy":true,"weight":{"available":4400,"unavailable":0,"total":4400}},"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3","health":"HEALTHY"},"node_id":"0dbceb58-8f05-496c-9c09-48fa16947ea3","next":null}

and when we call curl localhost:8001/upstreams/printing-upgraded.v002/targets we see {"data":[{"id":"f466b583-e7ff-445e-8dea-083c79ba813d","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724218836.114,"target":"10.160.165.140:32770"},{"id":"0f1393e2-c4a7-4410-8da1-12956db77561","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724191836.659,"target":"10.160.164.105:32769"},{"id":"b7cad805-fc27-4055-afc0-0597f1c0189c","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724191236.387,"target":"10.160.166.35:32769"},{"id":"7706c4c7-42d9-44e0-addc-c38757461c87","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724190738.579,"target":"10.160.165.173:32769"},{"id":"84f9bf1f-7dea-4b77-9984-a0c0d40da96e","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724190216.961,"target":"10.160.164.136:32770"},{"id":"7db70aa4-8eaa-485d-abd1-fa48a3bd38f9","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724190216.785,"target":"10.160.167.80:32769"},{"id":"18c819ff-6158-4610-b817-7db69b932cef","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724189676.335,"target":"10.160.165.19:32770"},{"id":"999d013f-6e3e-4980-87bd-3fac91bb0efa","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724189676.157,"target":"10.160.167.107:32770"},{"id":"a71630cf-abfd-4c3a-a7e3-067628e03af2","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724189016.282,"target":"10.160.166.20:32769"},{"id":"5a7abdea-7a0b-4a5a-84a9-3cd339e6221e","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724188465.105,"target":"10.160.166.137:32770"},{"id":"4e2c1776-8618-42cd-8e5b-0d5953ea1908","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724187877.271,"target":"10.160.166.38:32768"},{"id":"78ce4630-175b-4d66-82e7-bec92c261404","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724187757.106,"target":"10.160.166.207:32769"},{"id":"54d6f98d-3d61-4396-aada-c484ff78967f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724186607.07,"target":"10.160.164.9:32770"},{"id":"035c1cea-ccdf-4b04-8ec1-2f8deee02e9f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724186076.654,"target":"10.160.164.72:32769"},{"id":"c7fcf2c2-4e85-479a-9b0c-eba1e504de57","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724181504.92,"target":"10.160.164.175:32768"},{"id":"92d760cd-a075-47c5-b306-7f1faca246a2","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724177363.394,"target":"10.160.165.5:32768"},{"id":"c776334b-f5d3-4810-a808-67c094560ed0","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724175697.106,"target":"10.160.165.66:32768"},{"id":"e8a977d3-f594-43cf-bace-deb7881c36fe","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724175637.074,"target":"10.160.167.66:32768"},{"id":"430475cc-5556-4356-934c-761b95810a28","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724175145.092,"target":"10.160.166.99:32768"},{"id":"ef1a251e-9026-46ce-a96a-df78dac81169","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724174557.164,"target":"10.160.167.40:32768"},{"id":"3c81a320-3bdd-4ffc-8859-83611edd3fa3","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724174078.544,"target":"10.160.166.30:32768"},{"id":"9b30953e-1ddb-4ec9-a635-57cb3ec66de4","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724173296.626,"target":"10.160.164.128:32768"},{"id":"65335009-67d5-4cc8-804b-2e59ce219e73","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724171660.221,"target":"10.160.164.58:32768"},{"id":"703ad0ae-6e59-4a04-a068-95077f6cd1a8","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724169396.924,"target":"10.160.164.236:32768"},{"id":"a9db7594-7003-4a5f-92b2-c0dc9a7c4381","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724168841.165,"target":"10.160.167.151:32768"},{"id":"758e8f7c-3d0c-409d-bc06-051caf249327","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724166636.474,"target":"10.160.166.138:32768"},{"id":"901b1dea-d8b9-4778-bae5-aa0e03b01b8f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724164177.436,"target":"10.160.167.198:32768"},{"id":"713d2a46-8146-4266-85c7-557343236b87","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724159556.269,"target":"10.160.165.11:32768"},{"id":"7993ce39-15f0-432d-88bb-cd174b36a23f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724155776.632,"target":"10.160.166.156:32769"},{"id":"70293a68-e568-4033-b4d5-0d878b59d4cc","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724105376.921,"target":"10.160.167.168:32769"},{"id":"19f56183-921f-40a3-b024-85d6a818076a","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724100277.11,"target":"10.160.165.184:32768"},{"id":"5b3d42d8-d835-4457-b194-bf4da7ab73c7","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724098416.459,"target":"10.160.165.215:32768"},{"id":"c565ab99-2edc-451b-b9e2-61d9d42ee79e","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724097876.364,"target":"10.160.164.182:32768"},{"id":"8363ed5a-cbdc-4cac-8ad3-f091760772f5","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724088877.176,"target":"10.160.167.215:32769"}] The IP is 10.160.164.33 is found in the balancer but not in the targets, we tried deleting a valid target to check if the balancer is recreated, but it didn't happen.

That's the expected behavior, due to the worker events system being broken in your system, the Kong Balancer system can not refresh all worker data structs that contain all target and upstream statuses so that you can see the dirty status in your API output.

puneethps commented 3 weeks ago

@oowl we are also sometimes seeing 2024/08/12 13:21:37 [error] 5315#0: *32717527 [lua] worker.lua:248: communicate(): event worker failed: failed to receive the first 2 bytes: closed, context: ngx.timer is this related?

It's related, to which means the consumer of worker event was broken in some ways, but for us, it is still hard to know what I can do for these, Do you guys try editing related code or config to inject your self-defined logic? Please give me the detailed error and config, it will be very helpful to try to understand your problem. Or give us which version started to happen this problem information.

We don't have any custom logic, the only thing we are doing is that there is a lambda which runs every 15 mins to add or remove targets to the upstream based on ECS service scaling activity.

puneethps commented 3 weeks ago

@oowl when we called curl localhost:8001/upstreams/printing-upgraded.v002/health?balancer_health=1, the response is {"data":{"details":{"hosts":[{"host":"10.160.167.185","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.185","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.165","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.165","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.31","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.31","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.200","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.200","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.237","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.237","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.89","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.89","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.41","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.41","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.211","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.211","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.105","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.105","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.203","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.203","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.21","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.21","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.81","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.81","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.122","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.122","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.195","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.195","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.227","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.227","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.53","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.53","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.38","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.38","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.146","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.146","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.33","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.33","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.73","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.73","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.79","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.79","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.61","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.61","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.98","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.98","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.50","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.50","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.163","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.163","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.153","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.153","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.189","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.189","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.247","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.247","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.120","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.120","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.129","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.129","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.72","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.72","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.24","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.24","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.61","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.61","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.36","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.36","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.206","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.206","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.84","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.84","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.184","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.184","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.75","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.75","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.166.198","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.166.198","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.31","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.31","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.63","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.63","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.164.45","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.164.45","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.167.58","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.167.58","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768},{"host":"10.160.165.253","weight":{"available":100,"unavailable":0,"total":100},"addresses":[{"weight":100,"ip":"10.160.165.253","healthy":true,"port":32768}],"nodeWeight":100,"dns":"A","port":32768}],"healthy":true,"weight":{"available":4400,"unavailable":0,"total":4400}},"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3","health":"HEALTHY"},"node_id":"0dbceb58-8f05-496c-9c09-48fa16947ea3","next":null} and when we call curl localhost:8001/upstreams/printing-upgraded.v002/targets we see {"data":[{"id":"f466b583-e7ff-445e-8dea-083c79ba813d","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724218836.114,"target":"10.160.165.140:32770"},{"id":"0f1393e2-c4a7-4410-8da1-12956db77561","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724191836.659,"target":"10.160.164.105:32769"},{"id":"b7cad805-fc27-4055-afc0-0597f1c0189c","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724191236.387,"target":"10.160.166.35:32769"},{"id":"7706c4c7-42d9-44e0-addc-c38757461c87","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724190738.579,"target":"10.160.165.173:32769"},{"id":"84f9bf1f-7dea-4b77-9984-a0c0d40da96e","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724190216.961,"target":"10.160.164.136:32770"},{"id":"7db70aa4-8eaa-485d-abd1-fa48a3bd38f9","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724190216.785,"target":"10.160.167.80:32769"},{"id":"18c819ff-6158-4610-b817-7db69b932cef","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724189676.335,"target":"10.160.165.19:32770"},{"id":"999d013f-6e3e-4980-87bd-3fac91bb0efa","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724189676.157,"target":"10.160.167.107:32770"},{"id":"a71630cf-abfd-4c3a-a7e3-067628e03af2","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724189016.282,"target":"10.160.166.20:32769"},{"id":"5a7abdea-7a0b-4a5a-84a9-3cd339e6221e","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724188465.105,"target":"10.160.166.137:32770"},{"id":"4e2c1776-8618-42cd-8e5b-0d5953ea1908","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724187877.271,"target":"10.160.166.38:32768"},{"id":"78ce4630-175b-4d66-82e7-bec92c261404","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724187757.106,"target":"10.160.166.207:32769"},{"id":"54d6f98d-3d61-4396-aada-c484ff78967f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724186607.07,"target":"10.160.164.9:32770"},{"id":"035c1cea-ccdf-4b04-8ec1-2f8deee02e9f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724186076.654,"target":"10.160.164.72:32769"},{"id":"c7fcf2c2-4e85-479a-9b0c-eba1e504de57","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724181504.92,"target":"10.160.164.175:32768"},{"id":"92d760cd-a075-47c5-b306-7f1faca246a2","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724177363.394,"target":"10.160.165.5:32768"},{"id":"c776334b-f5d3-4810-a808-67c094560ed0","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724175697.106,"target":"10.160.165.66:32768"},{"id":"e8a977d3-f594-43cf-bace-deb7881c36fe","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724175637.074,"target":"10.160.167.66:32768"},{"id":"430475cc-5556-4356-934c-761b95810a28","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724175145.092,"target":"10.160.166.99:32768"},{"id":"ef1a251e-9026-46ce-a96a-df78dac81169","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724174557.164,"target":"10.160.167.40:32768"},{"id":"3c81a320-3bdd-4ffc-8859-83611edd3fa3","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724174078.544,"target":"10.160.166.30:32768"},{"id":"9b30953e-1ddb-4ec9-a635-57cb3ec66de4","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724173296.626,"target":"10.160.164.128:32768"},{"id":"65335009-67d5-4cc8-804b-2e59ce219e73","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724171660.221,"target":"10.160.164.58:32768"},{"id":"703ad0ae-6e59-4a04-a068-95077f6cd1a8","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724169396.924,"target":"10.160.164.236:32768"},{"id":"a9db7594-7003-4a5f-92b2-c0dc9a7c4381","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724168841.165,"target":"10.160.167.151:32768"},{"id":"758e8f7c-3d0c-409d-bc06-051caf249327","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724166636.474,"target":"10.160.166.138:32768"},{"id":"901b1dea-d8b9-4778-bae5-aa0e03b01b8f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724164177.436,"target":"10.160.167.198:32768"},{"id":"713d2a46-8146-4266-85c7-557343236b87","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724159556.269,"target":"10.160.165.11:32768"},{"id":"7993ce39-15f0-432d-88bb-cd174b36a23f","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724155776.632,"target":"10.160.166.156:32769"},{"id":"70293a68-e568-4033-b4d5-0d878b59d4cc","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724105376.921,"target":"10.160.167.168:32769"},{"id":"19f56183-921f-40a3-b024-85d6a818076a","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724100277.11,"target":"10.160.165.184:32768"},{"id":"5b3d42d8-d835-4457-b194-bf4da7ab73c7","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724098416.459,"target":"10.160.165.215:32768"},{"id":"c565ab99-2edc-451b-b9e2-61d9d42ee79e","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724097876.364,"target":"10.160.164.182:32768"},{"id":"8363ed5a-cbdc-4cac-8ad3-f091760772f5","weight":100,"tags":null,"upstream":{"id":"0031d52b-eae2-41e5-bad8-55671fb2f9a3"},"created_at":1724088877.176,"target":"10.160.167.215:32769"}] The IP is 10.160.164.33 is found in the balancer but not in the targets, we tried deleting a valid target to check if the balancer is recreated, but it didn't happen.

That's the expected behavior, due to the worker events system being broken in your system, the Kong Balancer system can not refresh all worker data structs that contain all target and upstream statuses so that you can see the dirty status in your API output.

When will the worker system break, what is it constrained on? network/cpu/memory? maybe we can look at our container metrics based on your recommendation.

oowl commented 3 weeks ago

When will the worker system break, what is it constrained on? network/cpu/memory? maybe we can look at our container metrics based on your recommendation.

that's the problem I want to know. it was supposed to work fine, but not. I want to know why. it will base on information that you give.

puneethps commented 3 weeks ago

When will the worker system break, what is it constrained on? network/cpu/memory? maybe we can look at our container metrics based on your recommendation. that's the problem I want to know. it was supposed to work fine, but not. I want to know why. it will base on information that you give.

ok, what are the next steps you suggest to debug this issue? or is there anyone in kong who is more experienced in handling these kinds of worker events problem?

chronolaw commented 3 weeks ago

Could you try the latest (3.7) or LTS (3.4) kong gateway, since we have many updates after 3.2.

puneethps commented 3 weeks ago

@chronolaw are you aware of any fixes which are relevant for the issue we are facing? I had a cursory look at the changelog and didn't find anything, upgrading would be a big effort for us as we will have to run all kinds of functional and perf tests after the upgrade.

dndx commented 2 weeks ago

Hello @puneethps,

Thank you for reporting this issue. We appreciate your feedback, but please note that for open source releases, we only accept issue reports for the latest minor release series.

Currently, we're on version 3.7, which includes numerous improvements and bug fixes since version 3.2. There's a good chance that upgrading might resolve your issue.

If you're using Kong Gateway in an enterprise setting, our support policy differs:

  1. For enterprise support details, please refer to: https://docs.konghq.com/gateway/3.7.x/support-policy/
  2. If you're an enterprise customer, please raise a ticket through our Support Portal to connect with our support engineers.

Let us know if you have any questions about upgrading or if you need further assistance.

github-actions[bot] commented 4 days ago

This issue is marked as stale because it has been open for 14 days with no activity.