Open javiku opened 3 years ago
Is it possible to use a metric similar to container_memory_working_set_bytes
, which gives a rough estimate of how much memory cannot be evicted? This is also one of the metrics kubernetes monitors to determine whether to kill a pod or not.
We had to rollback the Ambassador upgrade to a version without the reconfig ratelimiter. I'm also wondering if we should expose to flag to disable the reconfig ratelimiter, so the user can opt-out while experiencing operational pain.
I'm happy to submit a PR for this fix, if we can agree on the next steps.
@roystchiang that does sound reasonable to me and a PR would be very welcome
I experience the same issue but I think that this new version should address this issue https://github.com/emissary-ingress/emissary/blob/master/CHANGELOG.md#1137-june-03-2021
Was this fixed by https://github.com/emissary-ingress/emissary/commit/61b94d0c62a297a0bfcf9ee395f01030ed5443dc ? Should we resolve this?
Describe the bug
Ambassador Pods will "rate-limit" reconfiguration operations if they detect the memory usage is too high. For determining current memory usage, cache memory is being taken into account (the value reported by
memory.usage_in_bytes
includes both RSS and Cache memory), which results in frequently hitting the reconfig rate-limit after some time in long-running clusters with many reconfigurations.It would probably be more accurate to be looking at only non-cache memory usage, since that is the critical value, and the typical value shown in Grafana dasboards for Pod memory usage.
To Reproduce
Our setup:
ambex/ratelimit.go/updaterWithTicker
will be triggered. After some time, if memory usage is over 90%, only one reconfiguration will be applied every 10 minutes.We see this in both v1.11.1 and v1.12.1.
Expected behavior
We would expect that reconfiguration rate-limiting only fires if critical memory usage is high, meaning memory usage excluding cache usage. When processes require more memory and there is none free, cache memory is automatically released. Think for example of Grafana dashboards that show Pod memory usage: they display "real" memory usage, excluding cache.
Moreover, given Kubernetes constraints, it is not easy (or even feasible) to configure cache memory usage and disposal.
Versions
Additional context
After seeing delays of up to 10 minutes in Ambassador processing endpoint updates, we started digging into the code and got to this point:
cmd/ambex/ratelimit.go - updaterWithTicker
So it seems that if Ambassador detects a memory usage of more than 90% if will only perform one reconfig every 10 minutes. During our tests, we saw in Grafana Ambassador Pods using 300MB of the 800MB limit they had assigned, so we didn't understand why this was happening.
But then we found this:
pkg/memory/memory.go - readUsage
The value reported by
memory.usage_in_bytes
includes both RSS and Cache memory. This means the memory usage function is using a value that includes cache memory, which we didn't expect. The problem with taking cache into account is that cache can grow very easily (+400MB in our case) and is never released unless needed by some process, so it is hard to avoid this problem unless restarting the Pods or including some cache release mechanism.