archesproject / arches

Arches is a web platform for creating, managing, & visualizing geospatial data. Arches was inspired by the needs of the Cultural Heritage community, particularly the widespread need of organizations to build & manage cultural heritage inventories
GNU Affero General Public License v3.0
214 stars 144 forks source link

Health Check and Status Page #9418

Open aarongundel opened 1 year ago

aarongundel commented 1 year ago

It would be nice to have a health check and status page for arches. Ideally, a health check URL would tell the health of the arches application so that an automated process could check the URL and (if necessary) remove unhealthy instances from a pool of Arches instances. A status page would be a user-friendly page that showed the status of the system and its dependencies. An example would be: the database, elasticsearch, cantaloupe, etc. This would help users with high level diagnostics if there are problems with an Arches install.

aj-he commented 1 year ago

endpoints for kubernetes liveness, readiness and startup probes should be provided (/ready, /health)

https://medium.com/devops-mojo/kubernetes-probes-liveness-readiness-startup-overview-introduction-to-probes-types-configure-health-checks-206ff7c24487

adamlodge commented 1 year ago

Something that indicates celery status, if it's running, and web-based access to the celery log so you can see what it's doing.

aarongundel commented 9 months ago

@aj-he I'm looking at working on this soon. I like the two endpoint approach, especially since Arches depends on so many external services. The liveness endpoint will tell you if Arches is running and the ready endpoint will make sure Arches can connect to all of its dependent services.

Here's a list of the external services that I'll check as part of this... Postgres Elasticsearch File Storage (including external file storage) Cataloupe (if configured) Redis/Memcached (if configured) External Authentication (if configured) Celery Broker (if configured) Celery Workers (if configured on local machine)

If you can think of more things that might be checked as part of this, let me know. Perhaps some of these should be optional (as in, Arches still shows as ready if an optional component doesn't work).

If you've got Arches deployed using Kubernetes right now, I'd love to get your thoughts on what might also help HE here - or at the very least to leave these endpoints open for extension.

chiatt commented 9 months ago

@aarongundel It would be nice if this was designed in a very modular way allowing developers to write components for services they want to monitor that are not part of the core Arches stack. I think Cantaloupe is a good example because it's technically not a requirement even for AFS.

aj-he commented 9 months ago

@aarongundel, I agree with @chiatt as we also use other services outside of core services. Don't forget RabbitMQ.

We don't have a pure Kubernetes deployment, but we do have a prototype Azure Container App service (only v6 Arches tho - looking at v7), which is a managed K8S service underneath, and I'm keen on getting some health checks to support that.