Closed aravindavk closed 4 years ago
Can we do more of a 'Health Report' Tool instead? I am anticipating a tool which can have plugin for any new diagnostics and it can report, '[OK], [NOT OK], [WARNING]' as the status. More elaborate information can be logged to another file.
Think of something like below:
bash# gluster-health-report
CPU Usage: [OK]
Network Health: [OK]
Disconnect events: [WARNING]
Memory Usage: [WARNING]
Log rotate setup: [NOT OK]
Error logs in last day: [OK]
Changelog size: [WARNING]
....
You can find the detailed health-report at /var/log/glusterfs/health-report-$timestamp.log
It should output only the status, and more detailed reasoning, and numbers to arrive at that conclusions can be in the log file.
Any feature can add their own health-report by providing either bash or python (or anything else) which runs fine to give the above output. The tool should run each of these tests together and give a summary.
Any further idea on this would be welcome.
Can we do more of a 'Health Report' Tool instead?
+1
Started working on this tool https://github.com/aravindavk/gluster-health-report
The tool is in usable state(only one report exists to check glusterd is running or not). Installation and usage instructions are updated in README file or the repo.
Adding new report is very easy and documented in the README file. Please feel free to send pull request with your report idea.
Tried the above tool, looks neat, and works almost as I expected. Only question I have is, what if I want to write a bash script? We don't have to answer it immediately, but would be a good thing to pick up to make it more generic.
Support can be added for running bash scripts or any executable scripts.
This is a good idea. Anything we can do to help the user/customer administrate their system will make for a happier experience.
1) Disk space - running out of disk space can cause serious issues. We should warn, then yell (grin) if necessary to prevent this. 2) Client/server incompatibilities - can we check versions and warn each time a client is started that is not compatible with the server? 3) Overall performance monitoring - end-to-end through-put. either a hard number, or better, trends
Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.
Gluster needs a health checker tool which can be scheduled to run in each node in regular interval to check the status of the Cluster/Node.
Listed a few ideas here, feel free to add if any more ideas/metrics can be added.
Idea 1 - Check for errors and warnings in all Gluster log files and report
Check Every log message which Timestamp is greater than the previous run TIMESTAMP and print the number of errors and warnings.
For example,
Idea 2 - Uptime Report
Look for all gluster processes using
ps
command and collect the details about Uptime. This will help to identify if any process is restarted recently. Also pid change can be compared with previous reportCommand:
Get command line args from
/proc/<PID>/cmdline
instead of fromps
command to avoid issues while splitting the args with space.Example,
Above example, shows that
glusterd
started around 2 minutes back.