Closed seeker89 closed 3 years ago
Other than some minor nitpicking, this looks good - looking forward to taking it for a spin now :)
So I tried to take this for a spin and I got an unhealthy cluster with no clue about what is broken (And goldpinger/the cluster seems healthy otherwise):
$ http_proxy= curl -v http://goldpinger.sk1../cluster_health
* Trying 10.x.x.x...
* TCP_NODELAY set
* Connected to goldpinger.sk1... (10.x.x.x) port 80 (#0)
> GET /cluster_health HTTP/1.1
> Host: goldpinger.sk1...
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 418 I'm a teapot
< Server: nginx/1.17.10
< Date: Tue, 16 Mar 2021 17:13:08 GMT
< Content-Type: application/json
< Content-Length: 227
< Connection: keep-alive
<
{"OK":false,"duration-ns":22428592,"generated-at":"2021-03-16T17:13:08.479Z","nodesHealthy":["10.x.x.x","10.x.x.x","10.x.x.x","10.x.x.x","10.x.x.x","10.x.x.x"],"nodesTotal":6,"nodesUnhealthy":null}
* Connection #0 to host goldpinger.sk1... left intact
(Hosts and ip addresses masked)
So I tried to take this for a spin and I got an unhealthy cluster with no clue about what is broken (And goldpinger/the cluster seems healthy otherwise):
$ http_proxy= curl -v http://goldpinger.sk1../cluster_health * Trying 10.x.x.x... * TCP_NODELAY set * Connected to goldpinger.sk1... (10.x.x.x) port 80 (#0) > GET /cluster_health HTTP/1.1 > Host: goldpinger.sk1... > User-Agent: curl/7.58.0 > Accept: */* > < HTTP/1.1 418 I'm a teapot < Server: nginx/1.17.10 < Date: Tue, 16 Mar 2021 17:13:08 GMT < Content-Type: application/json < Content-Length: 227 < Connection: keep-alive < {"OK":false,"duration-ns":22428592,"generated-at":"2021-03-16T17:13:08.479Z","nodesHealthy":["10.x.x.x","10.x.x.x","10.x.x.x","10.x.x.x","10.x.x.x","10.x.x.x"],"nodesTotal":6,"nodesUnhealthy":null} * Connection #0 to host goldpinger.sk1... left intact
(Hosts and ip addresses masked)
Thanks. I forgot to set the default to true 🤦
I simplify a little bit too
One thought before this is merged: can we also expose this as a Prometheus metric? This would make it really easy to hook up a simple alert where Goldpinger is telling us something is amiss with a cluster, and then we can jump over to the cluster in question and do a more in-depth analysis.
@skamboj thanks, sorry I was trying to wing it from the UI :)
@erhudy any other wishes before this goes in?
ENGAGE
This adds a new endpoint, that returns 200 OK, if:
/check
callIt also returns some basics to know where to start when OK is false.
The actual implementation is in ./pkg/goldpinger/client.go, the rest is due to updated swagger codegen.