Open zbal opened 5 years ago
Following discussion in Slack...
From @ChopperLee2011 :
add version number + commit hash is doable.
/health
and/status
seems duplicate, we should use one, since we already use/health
, so I prefer to add all the info to/health
.
From @zbal :
agreed - also food-for-thoughts - what about having an arg like
/health?full=true
or/health?extended=true
so we can support extending the/health
response with extra info that we may not want to normally fetch (because of misc reason; e.g. high IO, need to connect to the DB, need to …) /cc @xavierchow @ChopperLee2011 this way we keep/health
(default) small, minimal and common across all components
@woodpig07 pls weigh in.
I vote for one single route /health
to display all the info.
And if we add more and more stuff to it later on, then we can break them down to sections like
/health
to display basic info/health?stats=true
to display basic info and server statistics/health?config=true
to display basic info and service configuration
....I've created PR so that we can configure the env variables we want to see.
// middleware.json
{
"routes:before": {
"loopback-healthcheck-middleware": {
"params": {
"env": {
"component": "OMNI_COMPONENT",
"ver": "OMNI_COMPONENT_VERSION",
"commit": "OMNI_COMPONENT_COMMIT"
}
}
}
}
}
GET /health will return as
{
started: "2019-08-01T16:24:34.013Z",
uptime: 63430.004,
version: "0.0.0",
env: {
component: "{OMNI_COMPONENT}",
ver: "{OMNI_COMPONENT_VERSION}",
commit: "{OMNI_COMPONENT_COMMIT}"
}
}
@xavierchow @zbal @ChopperLee2011 let me know if it's good to merge for release
I'm confused, the DevOps is expecting an API to get the info about a service but the API is relying on the env var set by DevOps when deploying? If so why don't check the ansible script directly?
@xavierchow - the reason is simply so we can expose those variables for misc status pages - those status pages are loading the data from the /health
check
e.g.
we have another design of system status page, do we need to combine the designs?
A public system status page about important SLI/SLO Enhanced health check check for OMNI microservices
System Status Page Health Check Extension
cc @ilyamochalov @xuqingfeng
@MiffyLiye @xavierchow - I understand there are other projects planning for status page - what are the ETA for those. This PR and the #2 have been opened for more than a year with no progress. I'm not so keen on having yet another year of discussion - weeks and months are also not "acceptable".
From what I've seen - the health check library you've worked on is in a separate repo, there is no conflicts with this one.
As for the design & the dashboard - the implementation of the base status page was 2h of work to get something out - which we did achieve. Whenever we fill like removing / replacing it with something more advanced - there won't be any problem, the wastage is minimal.
Can we move forward - close this issue and move onto other issues after ?
Note; I'm not saying we should not use that other library @MiffyLiye worked on - only that I need closure and get those tasks completed. Especially when all that is needed is an hour of fix and a merge.
it's opened for more than a year because we are not aligned and no one answered my question here: https://github.com/Wiredcraft/loopback-healthcheck-middleware/issues/1#issuecomment-518099975, I'm not quite aware of the priority of this task as well, I don't think we have to build a random request from you just because of it's small size.
this middleware is widely used by many components, merging the PR won't make you status page working, as it requires changes for many components, which context(project) would you like to integrate with the new /health
API?
besides the three fields, what else you need to expose to the health API? do you really need the alias for env var? see: https://github.com/Wiredcraft/loopback-healthcheck-middleware/pull/2#discussion_r493465380, to me it's overdesigned and confusing.
"component": "OMNI_COMPONENT",
"ver": "OMNI_COMPONENT_VERSION",
"commit": "OMNI_COMPONENT_COMMIT"
K8s has liveness and readiness probes for different purposes: https://k8smeetup.github.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
However we only expose a single /health
API, here is a typical usage of omni component:
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 3000
scheme: HTTP
initialDelaySeconds: 50
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: hype
ports:
- containerPort: 3000
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 3000
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
So I have a few confusions:
/health
returns 200?
I’d like to extend loopback.status() to include more useful info - any suggestion ?
Effectively - I’d like to include:
Currently we use
/health
that return the content ofapp.loopback.status()
— if we don’t want to change that, could we get a/status
that returns similar info + extended details ? (feels like duplicate though)…Ultimately - this will allow us to expose via API the status of a backend component - and render it in an Admin/Status page on admin-ui, offering visibility.
/cc @xavierchow - please help evaluate and re-assign to whomever could help.