Closed FastGeert closed 6 years ago
Could we leverage here on systemd which has builtin support for this kind of stuff?
I don't have a problem with that.
Moved to small nodes setup because due to the hyperconverged model we even need more control over this.
Problem
When provisioning vm's to cpu nodes, we take into account the memory they will consume from the host system, and distribute vms accordingly over the nodes.
We do not control yet how much memory is used by other supporting processes (agent, alba, ...) running on the cpu nodes. Based on the size of the node we do reserve memory that should not be used by vms (see https://github.com/0-complexity/selfhealing/blob/master/specs/provisioning-limits.md), but we do not limit it in the system.
Hence, if the memory allocation of the supporting processes goes out of bounds, we loose control, and cannot predict anymore how the linux OOM killer will start behaving, killing eg vm's.
Solution