TwiN / gatus

⛑ Automated developer-oriented status page
https://gatus.io
Apache License 2.0
5.76k stars 395 forks source link

Support Kubernetes liveness/readiness probes as condition #5

Open r-chris opened 3 years ago

r-chris commented 3 years ago

Hi - this looks very promising. What is your view on supporting additional monitoring options besides sending HTTP requests? In particular it would be great to also monitor service health for backend services with gatus. I read you are deploying this in Kubernetes and I would want to do the same. It would be great if we could also probe for the Kubernetes health status of backend services, which do not provide HTTP endpoints. Anyway, would this be out of scope for your project here?

TwiN commented 3 years ago

Hello @r-chris, supporting additional protocols is definitely not out of scope, though the configuration syntax is built around HTTP at the moment.

In the future, one of the ideas I had in mind was to add support for mainstream infrastructure components, like databases (i.e. mysql, postgres) and caches (redis, memcached).

The configuration would also be very simple, something like

services:
  - type: redis     # Is this field really necessary if the protocol is in the uri anyways?
    url: "redis://somehost:6379"
    command: GET some_key
    conditions:
      - (TBD)
  - type: mysql     # Is this field really necessary if the protocol is in the uri anyways?
    url: "mysql://somehost:3306/your_db_here"
    command: SELECT id FROM users WHERE name = 'John Doe'
    conditions:
      - (TBD)

The pros would mainly revolve around readability and ease of use, which is one of the most important points IMHO.

The cons, however, would revolve around the fact that each new technology that would need to be supported would need to be implemented. Likewise, this could grow the tree of dependencies quite a lot in the long run, so perhaps a more generic solution would be preferable - granted this can be done without adding too much complexity.

To be frank, I haven't put too much thought into this yet, so if you have any suggestions, these would be greatly appreciated.

cmhrpr commented 3 years ago

@TwinProduction Why not leave the dependencies to the container owner? Allow configuration of tools (types in this case) to point to the individual binary/package to be used to run the healthcheck. That way it would be infinitely extendable without relying on packaging up dependencies.

r-chris commented 3 years ago

Sounds good. I was thinking of a way to utilize the status monitoring already build into Kubernetes / Docker, but I supposed those are already exposed through HTTP anyway?

Of course you could add this per container, but then you have to find a way to run that reliably next to your existing processes, which I think tends to be painful.

On 3 Sep 2020, at 17:13, Christian C. notifications@github.com wrote:

 Hello @r-chris, supporting additional protocols is definitely not out of scope, though the configuration syntax is built around HTTP at the moment.

In the future, one of the ideas I had in mind was to add support for mainstream infrastructure components, like databases (i.e. mysql, postgres) and caches (redis, memcached).

The configuration would also be very simple, something like

services:

  • type: redis # Is this field really necessary if the protocol is in the uri anyways? url: "redis://somehost:6379" command: GET some_key conditions:
    • (TBD)
  • type: mysql # Is this field really necessary if the protocol is in the uri anyways? url: "mysql://somehost:3306/your_db_here" command: SELECT id FROM users WHERE name = 'John Doe' conditions:
    • (TBD) The pros would mainly revolve around readability and ease of use, which is one of the most important points IMHO.

The cons, however, would revolve around the fact that each new technology that would need to be supported would need to be implemented. Likewise, this could grow the tree of dependencies quite a lot in the long run, so perhaps a more generic solution would be preferable - granted this can be done without adding too much complexity.

To be frank, I haven't put too much thought into this yet, so if you have any suggestions, these would be greatly appreciated.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

TwiN commented 3 years ago

@r-chris It's certainly possible to do that, but the thing is, Kubernetes' health is determined by the probes, and the probe configuration offers things that aren't really possible to do from an external pod.

For instance, in Kubernetes, you can run a command within the container to determine whether a container is healthy or not:

    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy

This isn't something that an external pod can replicate. However, what gatus could do is directly communicate with the Kubernetes API to get the probe status from there.

Once again, though, this would severely limit what the conditions would be for such service, as the only data you'd get would be whether the container (and by association, the pod) is healthy or not.

mapl commented 3 years ago

I would love to see DNS based Health checks

something like

type: dns# 
    url: "udp://127.0.0.1:53"
    queryname: "host.example.org"
    querytype: "A"    
    conditions:
      - "[STATUS] == NOERROR"
TwiN commented 3 years ago

@mapl please create a separate feature request for this.

This issue is more targeted at adding support for leveraging Kubernetes' probes to determine whether a service is healthy or not.

TwiN commented 3 years ago

Somewhat related: as of v1.2.0, monitoring TCP services is now supported and Kubernetes auto discovery has been added in v1.4.0

Unfortunately, I haven't tackled leveraging the probe readiness/liveness results to determine whether a service is healthy or not yet, but since the dependencies are here, this has certainly increased the odds of this being implemented in the future.

After reflecting on it a bit more, I've come up with a decent implementation that should be acceptable:

  1. The placeholders [KUBERNETES_LIVENESS_HEALTHY] and [KUBERNETES_READINESS_HEALTHY] would be added.
  2. A new service parameter kubernetes would be required for the services that wish to monitor the liveness/readiness health. This parameter would contain service-name and namespace. This is necessary to allow us to resolve the aforementioned placeholders
  3. The new service parameter kubernetes must be automatically populated if the service was generated from auto discovery

Example without using auto discovery:

services:
  - name: twinnation
    url: "https://twinnation.org/health"
    interval: 30s
    kubernetes:
      service-name: "twinnation"
      namespace: "default"
    conditions:
      - "[KUBERNETES_LIVENESS_HEALTHY] == true"
      - "[KUBERNETES_READINESS_HEALTHY] == true"

Example using auto discovery:

kubernetes:
  auto-discover: true
  cluster-mode: "out"
  service-template:
    interval: 30s
    conditions:
      - "[KUBERNETES_LIVENESS_HEALTHY] == true"
      - "[KUBERNETES_READINESS_HEALTHY] == true"
  namespaces:
    - name: default
      hostname-suffix: ".default.svc.cluster.local" # Might be unnecessary? See note below
      target-path: "/health" # Might be unnecessary? See note below

That said, some adjustments might be needed. By default, calls will be made to the required url field for all monitored services, however, this implementation may not need to actually send any request (outside of calling the Kubernetes API to get the pod status)

I do have a new concern though: