389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
211 stars 93 forks source link

RFE Add cache ratio checks into Healthcheck tool #4124

Open 389-ds-bot opened 4 years ago

389-ds-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/51071


Issue Description

With the existence of autotuning, many Admins are not checking if the caches are optimally tuned. Autotuning provides a much better minimum default for the cache sizes, but it is not fully optimized. The server itself can not do this as it doesn't know how the system is being used, etc. So an admin needs to take manual action and adjust the sizes based on actual availability of resources. Adding a "performance check" into healthcheck would be beneficial. This check would just look at the various cache hit ratios and report warnings based on these values. For an example, an cache hit ratio less than 80% should report a warning (something like that).

The challenge is that when you first start the server is that the ratios are at zero. We really should only check the cache hit ratios once the server has been up and running and/or the caches are fully primed. All this information is available in our monitors (cache stats, server uptime, etc), but when do we say it's okay to check the ratios? After 1 hour, 6 hours? Or when the entry caches are filled? This might not be so straightforward. My point is that we need to reduce the risk of a false positive if we add this type of health check to the tool.

The other issue is deciding what cache hit ratio percentages should generate warnings. For example:

95% or higher = Green, no warning 85 - 95% = Amber < 85% = Red

This is a bit on the high end, but what percentage should trigger a warning? This should be discussed among the team.

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2020-05-07 05:17:19

I think there is a server uptime variable in cn=monitor we could read, and if that's less than 30 mins we can say "this may not yet be accurate" or similar?

I'd probably say 90%, 80% are the numbers for green/amber? But it's hard to know what's right here, there are many factors ....

389-ds-bot commented 4 years ago

Comment from firstyear (@Firstyear) at 2020-05-07 05:17:20

Metadata Update from @Firstyear:

389-ds-bot commented 4 years ago

Comment from mreynolds (@mreynolds389) at 2020-05-07 17:55:07

Metadata Update from @mreynolds389: