dolmen / angel-PS1

Your fancy shell prompt fed by your guardian angel
https://twitter.com/nglPS1
GNU Affero General Public License v3.0
61 stars 3 forks source link

Plugin/LoadAvg threshold for multi cpu #21

Open setop opened 8 years ago

setop commented 8 years ago

LoadAvg threshold should be multiplied by the number of CPU. Plus I thing 0.8 is a bit low.

dolmen commented 8 years ago

I choose to normalize the loadavg value to the [0, 1] range by dividing by the number of processors. This allow to have the same settings working on systems with different number of CPUs.

If 0.8 is a bit low, what do you suggest instead?

setop commented 8 years ago

Avg load is the average number of process which the system have had to schedule in a period of time (eg: one minute). It has no upper bound. So you won't normalize it to 0-1 by dividing by the number of CPUs.

For example, on a two CPU, avg load can be three. It just means that the system is overloaded. And that gives you the threshold : red if AvgLoad > number of CPUs.

Moreover as a sysadmin, I don't want a tool to hide me information : so I'm not keen in to trying to "normalise" this figure.

dolmen commented 8 years ago

The aim of this indicator is to show if the system is overloaded before it is too late. Remember that the indicator is shown in a shell prompt on that system. If the system is overloaded following your threshold (LoadAvg > number of CPUs), the shell is already unresponsive for interactive use. This is too late. For good interactive use, the system must be idling often.

So I think the indicator has value as is. I concede that the "LoadAvg" name is misleading. Any idea for a better name?

setop commented 8 years ago

Then the problem may not be the name but the indicator itself. LoadAvg plugin uses /proc/loadavg (also shown by uptime). For some system, it is perfectly fine to have a load = nb CPU. For some others it may be not.

An option would be to use an other indicator, like /proc/stat ; see SO for a way to compute CPU %age, the second answer is my favorite. But again, it is perfectly fine to have 100% CPU usage if you trigger a multi-threaded video encoding.

You can also make this indicator parametrized, such as LoadAvg(myThreshold).

You aim to produce an indicator which says "oh oh my system is getting bad!". But it is far more complicated than "what time is it ?" :) Because it really depends on how the system is used.

It's gonna be very hard to put an alerting mechanism into a shell prompt as it is more the trend which is important than the instant figure. Btw, loadavg is a very basic kind of a trend.

Personally, I like to have the loadavg raw figure and a red flag when it is greater than nbcpu. It is powerful enough for me to make a decision.

I hope it helps.