server: memory pressure notification hooks

cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

https://www.cockroachlabs.com

Other

30.12k stars 3.81k forks source link

server: memory pressure notification hooks #64965

Open jordanlewis opened 3 years ago

jordanlewis commented 3 years ago

When CockroachDB runs out of memory, it usually manifests as a SIGKILL from the oomkiller, giving the program no time to respond with any crash dumps or other emergency actions.

It appears that cgroups can be configured to send notifications before this happens, at a configurable percent of used memory. See: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files

It would be useful to let CockroachDB detect a close-to-OOM situation and perform custom logic, such as performing a crashdump, dumping active goroutines and a memory profile, or even dumping unnecessary caches.

Even without cgroups, perhaps we could poll Go's memstats and compare against a configured maximum, and use that to trigger any hooks.

Jira issue: CRDB-7366

ajwerner commented 3 years ago

The Go folks have always wanted people to experiment with the prototype linked in https://github.com/golang/go/issues/29696 (https://go-review.googlesource.com/c/go/+/46751). Maybe this exploration would call for giving that thing a shot.

abarganier commented 3 years ago

@ajwerner thanks for linking this - this definitely seems like something worth experimenting with, especially considering runtime experiments are happening elsewhere, such as @knz's task group resource accounting.

In a hypothetical where we actually used the SetMaxHeap API in crdb, I imagine we could have a goroutine responsible for crash dumps wait on the SetMaxHeap notify channel, where bytes is set to a very high value (one we'd expect to be a precursor to an OOM). If there's a send on the channel, it can immediately begin the process of writing crash dump information to a file in expectation of an OOM. Since you know crdb's internals much better than I do, what's your take on this idea? @jordanlewis?

knz commented 3 years ago

The idea is sound.

knz commented 3 years ago

Might even be worth adding an explicit call to runtime.GC() in that signal handler, who knows it may delay the OOM crash a bit further.

knz commented 3 years ago

this is obs infra, not server

abarganier commented 3 years ago

Just a quick update on this - through experimentation I was unsuccessful in subscribing to the memory.pressure_level notification from nodes running in containers, as the container processes' access to the cgroupfs is (understandably) readonly. The process needs write access to the cgroup.event_control file in the memory subsystem in order to subscribe to these notifications, and giving such write access comes with a big sacrifice in security (giving the container privileged access).

I've not given up entirely on this as it would be an exceptional heuristic for us to gain access to from inside each crdb node. Perhaps we can explore the possibility of subscribing to such notifications in the orchestration layer and then deliver notifications to the relevant crdb node over the network. More experimentation needs to be done to determine if this is a valid approach.

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!