Wiredcraft / loopback-healthcheck-middleware

MIT License
1 stars 2 forks source link

Improved health-check ? #1

Open zbal opened 5 years ago

zbal commented 5 years ago

I’d like to extend loopback.status() to include more useful info - any suggestion ?

Effectively - I’d like to include:

Currently we use /health that return the content of app.loopback.status() — if we don’t want to change that, could we get a /status that returns similar info + extended details ? (feels like duplicate though)…

Ultimately - this will allow us to expose via API the status of a backend component - and render it in an Admin/Status page on admin-ui, offering visibility.

/cc @xavierchow - please help evaluate and re-assign to whomever could help.

zbal commented 5 years ago

Following discussion in Slack...

From @ChopperLee2011 :

add version number + commit hash is doable. /health and /status seems duplicate, we should use one, since we already use /health, so I prefer to add all the info to /health.

From @zbal :

agreed - also food-for-thoughts - what about having an arg like /health?full=true or /health?extended=true so we can support extending the /health response with extra info that we may not want to normally fetch (because of misc reason; e.g. high IO, need to connect to the DB, need to …) /cc @xavierchow @ChopperLee2011 this way we keep /health (default) small, minimal and common across all components

xavierchow commented 5 years ago

@woodpig07 pls weigh in.

woodpig07 commented 5 years ago

I vote for one single route /health to display all the info.

And if we add more and more stuff to it later on, then we can break them down to sections like

woodpig07 commented 5 years ago

I've created PR so that we can configure the env variables we want to see.

// middleware.json
{
  "routes:before": {
    "loopback-healthcheck-middleware": {
      "params": {
        "env": {
          "component": "OMNI_COMPONENT",
          "ver": "OMNI_COMPONENT_VERSION",
          "commit": "OMNI_COMPONENT_COMMIT"
        }
      }
    }
  }
}

GET /health will return as

{
  started: "2019-08-01T16:24:34.013Z",
  uptime: 63430.004,
  version: "0.0.0",
  env: {
     component: "{OMNI_COMPONENT}",
     ver: "{OMNI_COMPONENT_VERSION}",
     commit: "{OMNI_COMPONENT_COMMIT}"
  }
}

@xavierchow @zbal @ChopperLee2011 let me know if it's good to merge for release

xavierchow commented 5 years ago

I'm confused, the DevOps is expecting an API to get the info about a service but the API is relying on the env var set by DevOps when deploying? If so why don't check the ansible script directly?

zbal commented 3 years ago

@xavierchow - the reason is simply so we can expose those variables for misc status pages - those status pages are loading the data from the /health check

e.g.

Screen Shot 2020-09-23 at 3 57 04 PM
MiffyLiye commented 3 years ago

we have another design of system status page, do we need to combine the designs?

A public system status page about important SLI/SLO Enhanced health check check for OMNI microservices

System Status Page Health Check Extension

cc @ilyamochalov @xuqingfeng

zbal commented 3 years ago

@MiffyLiye @xavierchow - I understand there are other projects planning for status page - what are the ETA for those. This PR and the #2 have been opened for more than a year with no progress. I'm not so keen on having yet another year of discussion - weeks and months are also not "acceptable".

From what I've seen - the health check library you've worked on is in a separate repo, there is no conflicts with this one.
As for the design & the dashboard - the implementation of the base status page was 2h of work to get something out - which we did achieve. Whenever we fill like removing / replacing it with something more advanced - there won't be any problem, the wastage is minimal.

Can we move forward - close this issue and move onto other issues after ?

Note; I'm not saying we should not use that other library @MiffyLiye worked on - only that I need closure and get those tasks completed. Especially when all that is needed is an hour of fix and a merge.

xavierchow commented 3 years ago
  1. it's opened for more than a year because we are not aligned and no one answered my question here: https://github.com/Wiredcraft/loopback-healthcheck-middleware/issues/1#issuecomment-518099975, I'm not quite aware of the priority of this task as well, I don't think we have to build a random request from you just because of it's small size.

  2. this middleware is widely used by many components, merging the PR won't make you status page working, as it requires changes for many components, which context(project) would you like to integrate with the new /health API?

  3. besides the three fields, what else you need to expose to the health API? do you really need the alias for env var? see: https://github.com/Wiredcraft/loopback-healthcheck-middleware/pull/2#discussion_r493465380, to me it's overdesigned and confusing.

          "component": "OMNI_COMPONENT",
          "ver": "OMNI_COMPONENT_VERSION",
          "commit": "OMNI_COMPONENT_COMMIT"
bsdelf commented 3 years ago

K8s has liveness and readiness probes for different purposes: https://k8smeetup.github.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

However we only expose a single /health API, here is a typical usage of omni component:

        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 50
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: hype
        ports:
        - containerPort: 3000
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

So I have a few confusions: