contribsys / faktory

Language-agnostic persistent background job server
https://contribsys.com/faktory/
Other
5.76k stars 229 forks source link

Add a new /health endpoint to validate all components are ready #347

Closed lucasteligioridis closed 2 years ago

lucasteligioridis commented 3 years ago

We'd love a /health endpoint that validates all the services for Faktory are working, which would include and not limited to:

This would mean we could use this endpoint to check the complete health of the faktory-server and everything is required to be "working". With the license availability, we'd still expect that to check once a day and just change then, to prevent an overwhelm of external calls and to your DL server, we might be hitting this endpoint once every 5 seconds.

This could then be updated in the documentation, I'd prefer to make http calls for health than the TCP socket, since this would be checking the same thing anyway :)

mperham commented 3 years ago

There are two endpoints that might be useful:

They don't give any data about the license because they are both meant to be used by any Faktory instance.

lucasteligioridis commented 3 years ago

Happy to use the /stats endpoint, which I'm currently using as a health check.

Can you clarify with me that if the /stats endpoint returns at all, it is a good indication that everything has connected correctly? i.e. redis / faktory backend are up?

Any chance we could get some metadata about redis in there?

mperham commented 3 years ago

What redis info would you want to see?

lucasteligioridis commented 3 years ago

Maybe a validation that redis has connected.

Would also include something about the redis host name? i.e. to validate that it connected to either our own one or the one baked into the faktory server.

mperham commented 3 years ago

The /stats endpoint pulls the queue sizes so each call does implicitly hit Redis, it will return 500 if Redis is down:

❯ curl -i http://localhost:7420/stats
HTTP/1.1 500 Internal Server Error
Content-Language: en
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Thu, 15 Apr 2021 22:05:35 GMT
Content-Length: 53

dial tcp 127.0.0.1:6379: connect: connection refused

The /debug page shows your Redis location or URL in Data Location:

Screen Shot 2021-04-15 at 3 06 47 PM
lucasteligioridis commented 3 years ago

Ah great, so it's an implicit check of redis apart of the /stats call 👍🏼 I'm happy with that result. Can always query /debug for more information as you suggested 👍🏼