mknyszek commented 3 years ago

Proposal: Soft memory limit

Author: Michael Knyszek

Summary

I propose a new option for tuning the behavior of the Go garbage collector by setting a soft memory limit on the total amount of memory that Go uses.

This option comes in two flavors: a new runtime/debug function called SetMemoryLimit and a GOMEMLIMIT environment variable. In sum, the runtime will try to maintain this memory limit by limiting the size of the heap, and by returning memory to the underlying platform more aggressively. This includes with a mechanism to help mitigate garbage collection death spirals. Finally, by setting GOGC=off, the Go runtime will always grow the heap to the full memory limit.

This new option gives applications better control over their resource economy. It empowers users to:

Better utilize the memory that they already have,
Confidently decrease their memory limits, knowing Go will respect them,
Avoid unsupported forms of garbage collection tuning.

Details

Full design document found here.

Note that, for the time being, this proposal intends to supersede #44309. Frankly, I haven't been able to find a significant use-case for it, as opposed to a soft memory limit overall. If you believe you have a real-world use-case for a memory target where a memory limit with GOGC=off would not solve the same problem, please do not hesitate to post on that issue, contact me on the gophers slack, or via email at mknyszek@golang.org. Please include as much detail as you can.

gopherbot commented 3 years ago

Change https://golang.org/cl/350116 mentions this issue: design: add proposal for a soft memory limit

mpx commented 3 years ago

Afaict, the impact of memory limit is visible once the GC is CPU throttled, but not before. Would it be worth exposing the current effective GOGC as well?

mknyszek commented 3 years ago

@mpx I think that's an interesting idea. If GOGC is not off, then you have a very clear sign of throttling in telemetry. However, if GOGC=off I think it's harder to tell, and it gets blurry once the runtime starts bumping up against the GC CPU utilization limit, i.e. what does effective GOGC mean when the runtime is letting itself exceed the heap goal?

I think that's pretty close. Ideally we would have just one metric that could show, at-a-glance, "are you in the red, and if so, how far?"

raulk commented 3 years ago

In case you find this useful as a reference (and possibly to include in "prior art"), the go-watchdog library schedules GC according to a user-defined policy. It can infer limits from the environment/host, container, and it can target a maximum heap size defined by the user. I built this library to deal with https://github.com/golang/go/issues/42805, and ever since we integrated it into https://github.com/filecoin-project/lotus, we haven't had a single OOM reported.

rsc commented 2 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 2 years ago

@mknyszek what is the status of this?

mknyszek commented 2 years ago

@rsc I believe the design is complete. I've received feedback on the design, iterated on it, and I've arrived at a point where there aren't any major remaining comments that need to be addressed. I think the big question at the center of this proposal is whether the API benefit is worth the cost. The implementation can change and improve over time; most of the details are internal.

Personally, I think the answer is yes. I've found that mechanisms that respects users' memory limits and that give the GC the flexibility to use more of the available memory are quite popular. Where Go users implement this themselves, they're left working with tools (like runtime.GC/debug.FreeOSMemory and heap ballasts) that have some significant pitfalls. The proposal also takes steps to mitigate the most significant costs of having a new GC tuning knob.

In terms of implementation, I have some of the foundational bits up for review now that I wish to land in 1.18 (I think they're uncontroversial improvements, mostly related to the scavenger). My next step is create a complete implementation and trial it on real workloads. I suspect that a complete implementation won't land in 1.18 at this point, which is fine. It'll give me time to work out any unexpected issues with the design in practice.

rsc commented 2 years ago

Thanks for the summary. Overall the reaction here seems overwhelmingly positive.

Does anyone object to doing this?

kent-h commented 2 years ago

I have some of the foundational bits up for review now that I wish to land in 1.18

I suspect that a complete implementation won't land in 1.18

@mknyszek I'm somewhat confused by this. At a high level, what are you hoping to include in 1.18, and what do you expect to come later? (Specifically: will we have extra knobs in 1.18, or will these changes be entirely internal?)

mknyszek commented 2 years ago

@Kent-H The proposal has not been accepted, so the API will definitely not land in 1.18. All that I'm planning to land is work on the scavenger, to make it scale a bit better. This is useful in its own right, and it happens that the implementation of SetMemoryLimit as described in the proposal depends on it. There won't be any internal functionality pertaining to SetMemoryLimit in the tree in Go 1.18.

rsc commented 2 years ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc commented 2 years ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group