TCP health checks of upstream targets not working

Kong / kubernetes-ingress-controller

:gorilla: Kong for Kubernetes: The official Ingress Controller for Kubernetes.

https://docs.konghq.com/kubernetes-ingress-controller/

Apache License 2.0

2.22k stars 592 forks source link

TCP health checks of upstream targets not working #1562

Closed willquill closed 2 years ago

willquill commented 3 years ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

We're running kong:2.3.3 in Kubernetes, and I haven't seen our issue fixed in the release notes for any subsequent version. We plan to upgrade soon, but in the meantime, we've been having a health check issue.

We routinely have 3/3 unhealthy TCP increments, but Kong does not set the upstream target to unhealthy.

[lua] healthcheck.lua:1123: log(): [healthcheck] (c332292b-cb03-42eb-8c06-8ebd3946199b:ingress-json-svc.streamsets.10099.svc) unhealthy TCP increment (3/3) for '192.168.201.7(192.168.201.7:10099)', context: ngx.timer

And our K8s code showing that unhealthy is set after 3 failures:

apiVersion: configuration.konghq.com/v1
kind: KongIngress
metadata:
  name: tcp-upstream
  namespace: streamsets
upstream:
  slots: 100
  hash_on: none
  algorithm: round-robin
  hash_fallback: none
  healthchecks:
    threshold: 0
    active:
      concurrency: 10
      type: tcp
      https_verify_certificate: false
      healthy:
        interval: 5
        successes: 2
      timeout: 2
      unhealthy:
        interval: 3
        tcp_failures: 3
        timeouts: 1

Yet the target is not set to unhealthy:

Similarly, I can manually set healthy targets to unhealthy, and Kong will not set them to healthy after 2 successful checks.

Expected Behavior

After three TCP unhealthy checks, an upstream target should become unhealthy.

Steps To Reproduce

1. In Upstream configuration, set to unhealthy after three TCP checks.
2. Bring down upstream node.
3. Kong will log the 3/3 unhealthy TCP checks.
4. But Kong will not set the target to unhealthy.

Kong Ingress Controller version

0.9.1

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.16-eks-7737de", GitCommit:"7737de131e58a68dda49cdd0ad821b4cb3665ae8", GitTreeState:"clean", BuildDate:"2021-03-10T21:33:25Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Anything else?

This is happening to our dev/stg/prd clusters, all on AWS EKS 1.18.
Our EKS worker nodes are running on m5.4xlarge instances.
I've reviewed the Kong and Kong Ingress Controller release notes for newer versions and have not seen a fix for this.
I don't see log entries indicative of worker/cluster events happening on CP as indicated here and fixed in 2.4.1: https://github.com/Kong/kong/pull/7085
I don't know when this issue began, but it has not always existed for us. It has been happening for at least one month, more likely longer.
Our temporary workaround is to attempt to achieve 100% uptime of all upstream targets, but this is difficult because our upstream targets are StreamSets SDCs also running in Kubernetes, and their reliability is abysmal.

rainest commented 3 years ago

This is looking at a hybrid mode CP node, correct? That wasn't stated explicitly, but CPs were mentioned and DP don't have a full admin API you can connect a GUI to.

That's expected as of 2.3.3. CP nodes shouldn't do anything with healthchecks because they don't route traffic upstream: https://github.com/Kong/kong/pull/6805

DPs expose read-only health information that should show expected unhealthy/health restored after successful probes on their status endpoint: https://docs.konghq.com/gateway-oss/2.5.x/hybrid-mode/#readonly-status-api-endpoints-on-data-plane

Note that the status endpoint displays pod-specific information (two Kong Pods can have different health statuses for the same upstream endpoint) and as such we don't use a Service for it--you'd need to either use the Pod IP/hostname or port-forwards instead.

willquill commented 3 years ago

Negative, not hybrid mode. In fact, this issue is still facing us nearly 4 months later, so I am in the middle of testing out a completely replacement for our entire Kong Gateway OOS infrastructure with a new hybrid-mode deployment.

Right now, in the environments facing this issue, we use a single deployment that runs both the proxy and the ingress controller. The proxy pod connects to the database and is used for proxying. It is both the control plane and data plane.

I'm actually having a lot of trouble figuring out how to deploy in Hybrid Mode in Kubernetes, as the documentation is lacking (there are no YAML manifests for hybrid mode, and the helm chart documentation leaves out a lot of information about what to include and what not to include in values.yaml for each deployment) , but that's unrelated to this issue.

rainest commented 3 years ago

Can you share the related Service and Ingress as well, the admin API output for GET /upstreams/UPSTREAM/health and GET /upstreams/UPSTREAM/health, and debug-level proxy logs for healthcheck.lua? The UPSTREAM name uses the pattern <servicename>.<namespace>.<port>.svc.

What additional information were you looking for in https://github.com/Kong/charts/blob/main/charts/kong/README.md#hybrid-mode ? There are example manifests at https://github.com/Kong/charts/blob/main/charts/kong/example-values/minimal-kong-hybrid-control.yaml and https://github.com/Kong/charts/blob/main/charts/kong/example-values/minimal-kong-hybrid-data.yaml.

willquill commented 3 years ago

Thank you for those links, rainest! I didn't know about those example values before! It would be great if you could add links to them in the hybrid-mode section in the README.

This roughly shows our odd deployment: https://imgur.com/a/ehsuflx

Since we're not really using a recommended deployment, I'd prefer to take the time to continue pursuing the new hybrid-mode deployment instead of continuing to debug this issue. If I struggle further with hybrid mode, I'll revisit this and share the requested output.

I should be good to go now with those samples. My problem was that I downloaded the full values.yaml and was going section by section to see whether I needed to modify any settings, and quite a few of the sections in values.yaml are not discussed in the hybrid-mode section of the README. I should have inferred that if the hybrid-mode documentation does not reference a particular section of values.yaml, I should simply remove that section from values.yaml as no customization is needed.

I appreciate your assistance in getting me back on track!

rainest commented 2 years ago

Your current topology is fine. Hybrid mode essentially exists if you need to place the database in a network segment data planes don't have access to; running instances that handle both control plane and data plane duties is a normal and supported configuration. Switching to hybrid mode shouldn't make any difference in what you're seeing.

I wasn't able to reproduce this locally, but it'd ultimately end up being an issue for https://github.com/Kong/kong unless the upstream configuration you're seeing via the admin API doesn't match what you'd expect based on the KongIngress. I didn't see that locally, and suspect it's not happening here. The most common problems I can think of would result in you seeing none of the KongIngress configuration at all--losing only part of the configuration there shouldn't happen--and you are seeing some healthcheck activity. There are a few fields that are missing or incorrectly named in the current CRD, and those are the only things I think would result in (effectively) partial configuration, but timeouts and TLS parameters shouldn't have any bearing on this.

Upstream will probably want you to go the the most recent version to investigate a bug. For this specifically, reviewing the healthcheck code, you may want to try making a custom image that includes additional log lines in the code that handles state transitions after updating counters and/or the code that calls the counter update after a check. Within the stock image that's at /usr/local/share/lua/5.1/resty/healthcheck.lua.

willquill commented 2 years ago

Can you share the related Service and Ingress as well, the admin API output for GET /upstreams/UPSTREAM/health and GET /upstreams/UPSTREAM/health, and debug-level proxy logs for healthcheck.lua? The UPSTREAM name uses the pattern <servicename>.<namespace>.<port>.svc

Here's the output for getting the health of the upstream in question: https://pastebin.com/v3Vicvny

You sent me the same GET command twice.

In this case, we had been running 8 ingest endpoints upstream. I changed it to 3 ingest endpoints upstream about an hour ago, and it's still showing all 8 as healthy.

Here are the service and ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-mything
  namespace: streamsets
  annotations: 
    kubernetes.io/ingress.class: "kong"
spec:
  rules:
  - http:
      paths:
      - path: /mything/
        backend: 
          serviceName: ingress-mything-svc
          servicePort: 10090
---
apiVersion: v1
kind: Service
metadata:
  name: ingress-mything-svc
  namespace: streamsets
  labels:
    app: ingress-mything-svc
  annotations: 
    configuration.konghq.com: https-upstream
    konghq.com/plugins: customer-key-auth
spec:
  type: ClusterIP
  ports:
  - name: mything-10090
    port: 10090
    targetPort: 10090
    protocol: TCP
  selector:
    environment: staging
---

I modified the deployment to include KONG_DEBUG_LEVEL="debug" and applied our deployment yaml, which recreated the Kong pods - when the new ones came up, the upstreams for this service correctly showed 3/8 healthy, as would be expected if everything were working correctly. So if we were to do scheduled rolling restarts of the deployment, we know that would kick in the accurate health checking again.

Anyway, I continued my tests by doing the following:

Took the 3/8 active ingest nodes down to 0/8, checked Kong. It still showed 3/8 active - for several minutes after they were down.
Took nodes up to 4/8, - nodes 03, 05, 06, and 02 - and then checked Kong. It still shows nodes 02, 03, and 05 active, but it doesn't show 06 as active.
Node 06 is 192.168.222.9 - see pastebin below

I grabbed another GET of the upstream and you can see that 192.168.222.9 is unhealthy in the check, despite it being one of our 4 valid nodes right now - here it is: https://pastebin.com/GXXHtfe1

While I've changed out log level to debug, I don't know how to "debug-level proxy logs for healthcheck.lua" - can you tell me how to do this?

willquill commented 2 years ago

@rainest Okay I found something bizarre. We have two services whose upstreams use identical targets, yet one is accurately performing health checks and setting the upstream targets to healthy while the other upstream doesn't appear to be performing health checks at all.

In this case, both upstreams have 192.168.222.9 in the target list. This is the node I added when going from 3/8 active nodes to 4/8 active nodes.

In the logs, I'm seeing healthchecks for 192.168.222.9 for the first service/upstream and seeing something very different for the second service/upstream.

The upstreams have identical health check configurations (they reference the same KongIngress).

See the comparison of the two logs (ingress-postal-svc and ingress-json-svc) here: https://imgur.com/a/nWXU0k3

I took this screenshot at 13:59:18. For some reason, there have been no healthchecks performed for the 192.168.222.9 target since 13:32 for the ingress-json-svc, while the ingress-postal-svc is working fine.

My Kibana query is message:*192.168.222.9* and message:*json* and message:*healthcheck*

If I take out querying for just that IP, I find that the ingress-json-svc service has no healthchecks for ANY upstream target performed since 13:32.

Here's the Service and Ingress where the healthchecks are being performed correctly (ingress-postal-svc):

# Postal Ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-postal
  namespace: streamsets
  annotations: 
    kubernetes.io/ingress.class: "kong"
    konghq.com/plugins: add-service-header
spec:
  rules:
  - http:
      paths:
      - path: /ingest_postal/
        backend: 
          serviceName: ingress-postal-svc
          servicePort: 10099
---
# Service for Postal Ingest
apiVersion: v1
kind: Service
metadata:
  name: ingress-postal-svc
  namespace: streamsets
  labels:
    app: ingress-postal-svc
  annotations: 
    konghq.com/override: https-upstream
spec:
  type: ClusterIP
  ports:
  - name: postal-10099
    port: 10099
    targetPort: 10099
    protocol: TCP
  selector:
    environment: staging
---

And here's the Ingress and Service for the one that is not performing healthchecks (ingress-json-svc):

# JSON Ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-json
  namespace: streamsets
  annotations: 
    kubernetes.io/ingress.class: "kong"
spec:
  rules:
  - http:
      paths:
      - path: /json_test/
        backend: 
          serviceName: ingress-json-svc
          servicePort: 10099
---
# Service for JSON Ingest
apiVersion: v1
kind: Service
metadata:
  name: ingress-json-svc
  namespace: streamsets
  labels:
    app: ingress-json-svc
  annotations: 
    konghq.com/override: https-upstream
    konghq.com/plugins: customer-key-auth
spec:
  type: ClusterIP
  ports:
  - name: json-10099
    port: 10099
    targetPort: 10099
    protocol: TCP
  selector:
    environment: staging
---

Plugin Difference:

On ingress-postal-svc, we add a request transformer to the route (add a header)
On ingress-json-svc, we add customer-key-auth to the service

rainest commented 2 years ago

Apologies, the admin API calls should have been GET /upstreams/UPSTREAM/health and GET /upstreams/UPSTREAM

In this case, we had been running 8 ingest endpoints upstream. I changed it to 3 ingest endpoints upstream about an hour ago, and it's still showing all 8 as healthy.

What's actually happening here? Not sure if this is a change to the Deployment replica count or something else. Changes to the replica count should result in a different list in kubectl get endpoints SVCNAME and the current endpoint list should appear in configuration soon after so long as the controller log indicates no errors, but if the endpoint list is outdated.

0.9.1 is pretty limited in logging the individual endpoints, but if you can use 2.0.5 at level debug it will list every endpoint update as it sees it (you'd probably need to do so on a test cluster or with a test ingress class due to the significant breaking changes after 1.0.

We have two services whose upstreams use identical targets, yet one is accurately performing health checks and setting the upstream targets to healthy while the other upstream doesn't appear to be performing health checks at all.

That may explain it. Do you see this issue on any upstreams that don't have duplicate targets?

I'm not an expert on this section of the code, but as I read https://github.com/Kong/lua-resty-healthcheck/blob/1.4.1/lib/resty/healthcheck.lua#L293-L320 the healthchecker does not have information about the upstream using a target, and keys target status on the IP+port+hostname combination alone, and there's only a single target list per instance. I'm asking around to see if the library authors can confirm that and/or how the Kong-level code (which is aware of upstreams) should handle this.

While I've changed out log level to debug, I don't know how to "debug-level proxy logs for healthcheck.lua" - can you tell me how to do this?

You edit the source and place it in a custom image. Download a copy of https://github.com/Kong/lua-resty-healthcheck/blob/1.4.1/lib/resty/healthcheck.lua, add additional log lines where you want to see additional information, and build a new image with FROM kong:<version> and COPY /path/to/local/healthcheck.lua /usr/local/share/lua/5.1/resty/healthcheck.lua, and run a test instance with that image. Not the most elegant process, but necessary in the absence of existing log lines.

The additional info about the duplicate targets should remove the need for that, however, we can hopefully replicate with that.

willquill commented 2 years ago

@rainest Thank you - with the holidays and being on vacation, I've been unable to do this so far, but my plan is:

Download a copy of healthcheck.lua, add the lines, build a new image, run a test instance with the image.
Also, run 2.0.5 in my test instance with the better debugging

What's actually happening here? Not sure if this is a change to the Deployment replica count or something else. Changes to the replica count should result in a different list in kubectl get endpoints SVCNAME and the current endpoint list should appear in configuration soon after so long as the controller log indicates no errors, but if the endpoint list is outdated.

Our upstream targets for nearly every service are REST API stages at the beginning of StreamSets jobs running StreamSets pipelines. Due to the nature of a StreamSets Data Collector deployment, we have 20 deployments, one for each SDC. So we have 20 endpoints. When I say "bring some ingest nodes down" I mean to go into the SS job settings and turn it down from 20 instances to fewer. From the Kubernetes perspective, the endpoints still exist. From Kong's perspective, they are simply unhealthy.

For example, with health checks being unreliable, we have to run an Ingest job on all 20 SDCs - so every one of the 20 pods is listening on port 10099 (our ingest port) because Kong has 20 upstream targets. If we take the Ingest job down to 15 SDCs, we'd expect for Kong to show 15/20 healthy and 5/20 healthy. Our workaround is to keep the REST API endpoint running on all 20 pods, so that all 20 are always healthy, but this is causing us to create more processes than we should have to create.

tl;dr The 20 upstream targets represent 20 Kubernetes pods (<ip>:<port>). There are always 20 pods running. But the number of pods listening on a specific port varies. When I run kubectl get endpoints ingress-json-svc -n streamsets I will always see 20 endpoints. The variable is the number of endpoints on which we have a REST API listening on a specific port, like 10099 for JSON.

I tested two scenarios:

Two services, both send to the same 10099 upstream targets
Three services, all send to the same 10099 upstream targets

In each scenario, I changed the jobs from 20/20 instances to 15/20 instances. In the two-service scenario, one upstream successfully marked the 5 targets as unhealthy and the other did not (all remained healthy). In the three-service scenario, two upstreams successfully marked the 5 targets as unhealthy and the other did not - and it was not the same service as in the first scenario.

Anyway, let me get more in depth logs and debugs and get back to you.

willquill commented 2 years ago

@rainest

I'm not an expert on this section of the code, but as I read https://github.com/Kong/lua-resty-healthcheck/blob/1.4.1/lib/resty/healthcheck.lua#L293-L320 the healthchecker does not have information about the upstream using a target, and keys target status on the IP+port+hostname combination alone, and there's only a single target list per instance. I'm asking around to see if the library authors can confirm that and/or how the Kong-level code (which is aware of upstreams) should handle this.

What did they say? If multiple services have the same upstream targets and Kong is accurately updating the endpoint (target) status of one but not all services, is it because it is only checking the endpoints once and then only updating the health of one service using those endpoints?

rainest commented 2 years ago

They weren't aware of anything, unfortunately.

Do you have an overlap of additional healthchecks for the same targets on other services and/or TCPIngresses in addition to the specific KongIngress+Service combinations you've mentioned earlier? The log images show what looks like an HTTP healthcheck (it mentions a 200 response) and TCP services (there are some lines from the stream subsystem).

I was stuck trying to reproduce this for quite some time because I actually wasn't getting any healthchecks with the provided configuration alone. Digging through the code that sets them up I found https://github.com/Kong/kong/commit/b02f0a97b8969d4b67b0597376f4e789d54db6e5, which disables TCP healthchecks in the http subsystem. As far as I can tell, if you have an upstream that's only used by HTTP services, TCP healthchecks just don't run on it, though I'm waiting to hear back if there's some configuration I missed.

That doesn't appear to fully explain what's happening in your case, however, since you are seeing healthchecks, and even if there are interactions from other types of checks, I'd expect that you'd see targets that remain healthy forever because they have no overlapping checks, rather than targets that report health inconsistently. Furthermore, with that code removed on a test instance to re-enable the checks, I wasn't able to replicate the issue: with multiple upstreams that use the same targets I do see "unhealthy TCP increment" logs for both upstreams, and both targets are marked unhealthy.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.