Open jordanlewis opened 3 years ago
The Go folks have always wanted people to experiment with the prototype linked in https://github.com/golang/go/issues/29696 (https://go-review.googlesource.com/c/go/+/46751). Maybe this exploration would call for giving that thing a shot.
@ajwerner thanks for linking this - this definitely seems like something worth experimenting with, especially considering runtime experiments are happening elsewhere, such as @knz's task group resource accounting.
In a hypothetical where we actually used the SetMaxHeap
API in crdb, I imagine we could have a goroutine responsible for crash dumps wait on the SetMaxHeap
notify channel, where bytes
is set to a very high value (one we'd expect to be a precursor to an OOM). If there's a send on the channel, it can immediately begin the process of writing crash dump information to a file in expectation of an OOM. Since you know crdb's internals much better than I do, what's your take on this idea? @jordanlewis?
The idea is sound.
Might even be worth adding an explicit call to runtime.GC()
in that signal handler, who knows it may delay the OOM crash a bit further.
this is obs infra, not server
Just a quick update on this - through experimentation I was unsuccessful in subscribing to the memory.pressure_level
notification from nodes running in containers, as the container processes' access to the cgroupfs is (understandably) readonly. The process needs write access to the cgroup.event_control
file in the memory subsystem in order to subscribe to these notifications, and giving such write access comes with a big sacrifice in security (giving the container privileged access).
I've not given up entirely on this as it would be an exceptional heuristic for us to gain access to from inside each crdb node. Perhaps we can explore the possibility of subscribing to such notifications in the orchestration layer and then deliver notifications to the relevant crdb node over the network. More experimentation needs to be done to determine if this is a valid approach.
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
When CockroachDB runs out of memory, it usually manifests as a
SIGKILL
from the oomkiller, giving the program no time to respond with any crash dumps or other emergency actions.It appears that cgroups can be configured to send notifications before this happens, at a configurable percent of used memory. See: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files
It would be useful to let CockroachDB detect a close-to-OOM situation and perform custom logic, such as performing a crashdump, dumping active goroutines and a memory profile, or even dumping unnecessary caches.
Even without cgroups, perhaps we could poll Go's memstats and compare against a configured maximum, and use that to trigger any hooks.
Jira issue: CRDB-7366