KusionStack / karpor

Intelligence for Kubernetes. World's most promising Kubernetes Visualization Tool for Developer and Platform Engineering teams.
https://karpor-demo.kusionstack.io
Apache License 2.0
454 stars 46 forks source link

Governance: Add /livez, /readyz for service status check #462

Closed elliotxx closed 1 month ago

elliotxx commented 4 months ago

What would you like to be added?

Add /livez, /readyz endpoint for service status check.

Expected effect:

$ curl https://karpor-demo.kusionstack.io/livez
OK
$ curl https://karpor-demo.kusionstack.io/readyz
[+] Ping ok
[+] Server ok
[+] Syncer ok
[+] Storage ok
health check passed

Expected livenessProbe and readinessProbe in Deployment YAML:

livenessProbe:
  failureThreshold: 3
  httpGet:
    path: /livez
    port: 80
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 5
  successThreshold: 1
  timeoutSeconds: 10
readinessProbe:
  failureThreshold: 3
  httpGet:
    path: /readyz
    port: 80
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 5
  successThreshold: 2
  timeoutSeconds: 10

That is to say, ready will check the health status of each core component of Karpor, and if all pass, the status code will return 200.

All core components of Karpor: Architecture

Ready Conditions:

Pseudocode reference:

// Register registers the livez and readyz handlers to the specified
// router.
func Register(r *chi.RouterGroup, serv server.Server, sync syncer.Syncer, sg storage.Storage) {
    r.GET("/livez", NewLivezHandler())
    r.GET("/readyz", NewReadyzHandler(serv, sync, sg))
}

// NewLivezHandler creates a new liveness check handler that can be
// used to check if the application is running.
func NewLivezHandler() http.HandlerFunc {
    conf := healthcheck.HandlerConfig{
        Verbose: false,
        // checkList is a list of healthcheck to run.
        Checks: []checks.Check{
            checks.NewPingCheck(),
        },
        FailureNotification: healthcheck.FailureNotification{Threshold: 1},
    }

    return healthcheck.NewHandler(conf)
}

// NewReadyzHandler creates a new readiness check handler that can be
// used to check if the application is ready to serve traffic.
func NewReadyzHandler(serv server.Server, sync syncer.Syncer, sg storage.Storage) http.HandlerFunc {
    conf := healthcheck.HandlerConfig{
        Verbose: true,
        // checkList is a list of healthcheck to run.
        Checks: []checks.Check{
            checks.NewPingCheck(),
            NewServerCheck(server),
            NewSyncerCheck(sync),
            NewStorageCheck(sg),
        },
        FailureNotification: healthcheck.FailureNotification{Threshold: 1},
    }

    return healthcheck.NewHandler(conf)
}

// Custom Component Check👇, Note that it's just pseudocode
// 
// etcdCheck is a check that returns true if the etcd is
// available.
type etcdCheck struct {
    etcd *etcd.ETCD
}

func NewETCDCheck(etcd *etcd.ETCD) checks.Check {
    return &etcdCheck{
        etcd: etcd,
    }
}

func (c *etcdCheck) Name() string {
    return "ETCD"
}

func (c *etcdCheck) Pass() bool {
    db, err := c.etcd.DB()
        return err == nil
}

// storageCheck is a check that returns true if the storage
// is available.
type storageCheck struct {
    sg storage.Storage
}

func NewStorageCheck(sg storage.Storage) checks.Check {
    return &storageCheck{
        sg: sg,
    }
}

func (c *storageCheck) Name() string {
    return "Storage"
}

func (c *storageCheck) Pass() bool {
    return c.sg.Ping()
}

You can reference healthcheck package.

Why is this needed?

In order to better monitor the health of services and components.

JasonHe-WQ commented 3 months ago

Could you plz define the condition of readiness? Since the readiness probe once passed, it indicates that the router is ready to serve.

ruquanzhao commented 3 months ago

@JasonHe-WQ Welcome! 👋👋👋 IMO, for karpor syncer, we should check status of elastic search(For example, the status should be green or yellow). And for karpor server, we should check vailability of etcd. (Correct me if I'm wrong @elliotxx)

It would be ideal if you could factor in scalability for health checks since we may introduce additional dependencies or support other storage in the future.

If you need any help or have any question, please feel free to ping @elliotxx or @ruquanzhao(me).

elliotxx commented 2 months ago

Could you plz define the condition of readiness? Since the readiness probe once passed, it indicates that the router is ready to serve.

@JasonHe-WQ I have added more details, you can refer to the description above 👆

JasonHe-WQ commented 2 months ago

You can reference healthcheck package.

That package is based on Gin framework🤣 And I would adopt the code to Chi framework first. LOL

elliotxx commented 2 months ago

You can reference healthcheck package.

That package is based on Gin framework🤣 And I would adopt the code to Chi framework first. LOL

@JasonHe-WQ Yes, I mean that if you don't know how to implement it, you can refer to the practice of this package (it really cannot use Karpor directly), just a reference. You can definitely have your own implementation~ Chi is a great framework, we certainly don't want to use gin anymore.

JasonHe-WQ commented 2 months ago

@JasonHe-WQ Yes, I mean that if you don't know how to implement it, you can refer to the practice of this package (it really cannot use Karpor directly), just a reference. You can definitely have your own implementation~ Chi is a great framework, we certainly don't want to use gin anymore.

However, I still have some questions. And I have sent an email to your email address from git commit, could you kindly plz get contact to me?

elliotxx commented 2 months ago

@JasonHe-WQ Yes, I mean that if you don't know how to implement it, you can refer to the practice of this package (it really cannot use Karpor directly), just a reference. You can definitely have your own implementation~ Chi is a great framework, we certainly don't want to use gin anymore.

However, I still have some questions. And I have sent an email to your email address from git commit, could you kindly plz get contact to me?

@JasonHe-WQ I have sent a friend request. We can discuss in detail in IM.

elliotxx commented 2 months ago

@JasonHe-WQ I have added pseudocode for custom component check, which can be referred to below.