Open mmzeeman opened 2 weeks ago
It might be an idea to use the available memory metric from cross platform monitoring tool. For instance, Python's psutils has something which will work. See: https://psutil.readthedocs.io/en/latest/#memory
memsup:get_system_memory_data/0
has a value called available_memory
that I think would make a great default for this alarm (falling back to free_memory
if not available). A PR changing that would be welcome.
If available_memory
is not what we want, then we should either improve it to be what we want or add another key getting the data that we want.
Indeed available_memory
is what we want to use here. Working on a PR.
It is useful to have good alarms. However, on a system with 64 gig ram, and 60 gig ram available, but with 59 gig of cached files
os_mon
will set thesystem_memory_high_watermark
alarm. This is not useful.In fact, in almost all normal usage situations this alarm is raised. Mostly because the underlying OS will use the available memory for useful things. This memory will be made available when needed by applications.
Steps to reproduce Start
os_mon
application on a system which has cached files. MacOS and Linux usually use all available ram for their file caches. Which will usually always raise this alarm when you start the os_mon application.Expected behavior This alarm is only raised when the available memory is low.
In this case the alarm was set on a system with this memory available.
Which has 302581 + 5834 pages of memory available (inactive + free)
The Linux system I got this on had this memory usage stats:
In this case this alarm was raised because the system uses almost all available memory for its file cache.
Instead of looking at the allocated ram,
memsup
should look at the available system ram.Affected versions All versions I think.
Additional context This can be platform dependent, because all platforms report memory usage differently. It might be that the c-code reporting the memory usage might need some changes to report more useful values.
PS.. I'm willing to help fix this bug.