camunda / camunda-platform-helm

Camunda Platform 8 Self-Managed Helm charts
https://docs.camunda.io/docs/self-managed/overview/
Apache License 2.0
71 stars 129 forks source link

[ENHANCEMENT] Identity Service should respond to health checks on port 80 as well #788

Open aflansburg opened 1 year ago

aflansburg commented 1 year ago

Describe the use case:

Use case: ingress-gce and container-native load balancing on Google Kubernetes engine

When using ingress-gce on Google Kubernetes Engine for the identity service, Google somewhat herds you to container native load balancing, by creating the load balancing resources and network endpoint groups (NEGs) and even health checks required.

However, backends can never become healthy as they always look to send health checks to port 80. Backends and health checks and NEGs all point to port 80. You can add annotations and a BackendConfig CRD, however, the backend for port 80 will always look to port 80 for readiness probes.

Without a ✅ from the backend, the Load Balancer will only respond with 502s.

Referencing potentially related issues: https://github.com/camunda/camunda-platform-helm/issues/707 https://github.com/camunda/camunda-platform-helm/issues/442

Here is an example of annotations that were used. While additional NEG was created for metrics, and a health check defined in a BackendConfig CRD, it did not work as port 80->8082 is still being probed for readiness by the LB backend.

# values yaml for combined ingress + separate identity ingress
identity:
  service:
    type: ClusterIP
    annotations: {
      cloud.google.com/app-protocols: '{"http": "HTTP", "metrics": "HTTP"}',
      cloud.google.com/backend-config: '{"ports": {
      "80": "camunda-backend-identity",
      "82": "camunda-backend-identity",
      }}',
      cloud.google.com/neg: '{"exposed_ports": {"80":{},"82":{}}}',
      controller.autoneg.dev/neg: '{"backend_services":{"80":[{"name":"camunda-platform-identity","region":"us-east4","max_rate_per_endpoint":100}],"82":[{"name":"camunda-platform-identity","region":"us-east4","max_connections_per_endpoint":1000}]}}'
    }
  fullURL: "https://identity.camunda-dev.XXXXXXX.com"
  ingress:
    host: "identity.camunda-dev.XXXXXXX.com"
    # Ingress.enabled if true, an ingress resource is deployed with the identity deployment. Only useful if an ingress controller is available, like nginx.
    enabled: true
    # Ingress.className defines the class or configuration of ingress which should be used by the controller
    className: null
    # Ingress.annotations defines the ingress related annotations, consumed mostly by the ingress controller
    annotations:
      # ingress.kubernetes.io/rewrite-target: "/"
      kubernetes.io/ingress.global-static-ip-name: "camunda-dev-identity-addr"
      networking.gke.io/managed-certificates: identity-managed-cert
      kubernetes.io/ingress.class: "gce"
      ingress.kubernetes.io/ssl-redirect: "true"
# camunda-backend-identity BackendConfig CRD
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: camunda-backend-identity
spec:
  healthCheck:
    checkIntervalSec: 30
    timeoutSec: 30
    healthyThreshold: 2
    unhealthyThreshold: 4
    type: HTTP
    requestPath: /actuator/health
    port: 82
---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: camunda-backend-zeebe-gw
spec:
  healthCheck:
    checkIntervalSec: 30
    timeoutSec: 30
    healthyThreshold: 2
    unhealthyThreshold: 4
    type: HTTP
    requestPath: /actuator/health
    port: 9600

Sidenote: ^ the above BackendConfig CRD for the zeebe gateway DOES actually mitigate this issue for that component.

Describe the enhancement/feature:

Identity service on port 80 responds to health checks at some path.

Desired outcome and acceptance tests:

GKE container-native load balancing configurations can receive a 200 response from the port enabling functionality of the LB.

aabouzaid commented 1 year ago

@aflansburg thanks for reporting this.

Which chart version is that? Is the readiness probe enabled?

I remember a similar issue existed before the C8 chart supported the readiness probe. The readiness has been enabled by default since 8.2.0