MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

Generate diagnostic report from system menu #399

Open ghukill opened 5 years ago

ghukill commented 5 years ago

One consistent pain point is diagnosing problems in the operations of Combine. This is due, in part, to the variety of services that Combine relies on:

Each have their own logs, that provide helpful information, but this is not readily available through the GUI.

Proposing a "Run Diagnostics" button what would generate a zip file full of potentially helpful information. Perhaps even a "Diagnostics" page that shows which services are up and operational.

richardcadler commented 4 years ago

This looks like a very helpful thing to me.

antmoth commented 4 years ago

Two possible things to go with this ticket:

I'm thinking that what we need to do is to get the logs for everything co-located into a spot on the filesystem that Combine has access to, and allow the user to view them...

Only certain services will be amenable to the 'array of green lights' option if used from inside Django. I think, but it seems like we could potentially set up a 'status'/'diagnostics' page that bypasses all of the running services, allowing us to Check Stuff Out when Django is down?

antmoth commented 4 years ago

I'm thinking that what we may want to do for the array-of-green-lights is to stitch together health-checks for all our services into a little command-line script, then set up (somehow?) a /status endpoint that doesn't rely on any of those services being up to run. The endpoint can call the script and construct HTML based on it? Am I totally off-base here?

Celery: There's apparently a web monitor program called Flower (as in flow, not as in botany). It's also possible to query redis-cli to monitor queue lengths.

Livy/Spark: Here, getting the status of all the Livy sessions might be the best we can do.

ElasticSearch: Actually has a health-check endpoint: GET _cluster/health.

Mongo: The mongo CLI has a ping command.

MySQL: It might be that the best way to check MySQL health is to try connecting to the db and performing a SELECT 1;.