runtime: mechanism for monitoring heap size

bradfitz commented 7 years ago

Tracking bug for some way for applications to monitor memory usage, apply backpressure, stay within limits, etc.

Related previous issues: #5049, #14162

rgooch commented 7 years ago

Can you please expand on what you have in mind?

bradfitz commented 7 years ago

I have nothing specific in mind. This bug was filed as part of a triage meeting with a bunch of us. One bug (#5049) was ancient with no activity and one bug (#14162) proposed a solution instead of discussing the problem.

This bug is recognition that there is a problem, and we've heard problem statements and potential solutions (and sometimes both) from a number of people.

The reality is that there are always memory limits, and it'd be nice for the Go runtime to help applications stay within them, through perhaps some combination of limiting itself, and/or helping the application apply backpressure when resources are getting tight. That might involve new runtime API surface to help applications know when things are getting tight.

/cc @nictuku also.

bradfitz commented 7 years ago

Btw, there was lots of good conversation at #14162 and it wasn't our intention to kill it or devalue it. It just didn't fit the proposal process, and we also didn't want to decline it, nor close it as a dup of #5049.

Changing the language is out of scope, so all discussions of things like catching memory allocation failures, language additions like "trymake" or "tryappend", etc, are all not going to happen.

But we can add runtime APIs to help out. That's what this bug is tracking.

/cc @matloob @aclements

rgooch commented 7 years ago

Agreed. "try*" isn't practical. It would require changing too make call-sites and even then would not catch all allocations. Adding runtime.SetSoftMemoryLimit() still seems like the best approach.

nictuku commented 7 years ago

It would be nice to have the ability to set a limit to the memory usage.

After a limit is set, perhaps the runtime could provide a clear indication that we're under memory pressure and that the application should avoid creating new allocations. Example new runtime APIs that would help:

func InMemoryPushback() bool; or
func RegisterPushbackFunc(func(inPushback bool))

That would provide a clear signal to the application. How exactly that's decided should be an internal implementation decision and not part of the API. An example implementation, to illustrate: if we limit ourselves to the heap size specified by the user, we could trigger GC whenever the used heap is close to the limit. Then we could enter pushback whenever the GC performance (latency or CPU overhead) is outside certain bounds. Apply smoothing as needed.

The approach suggested by this API has limitations.

For example, it's still possible for an application that is behaving well to do one monstrous allocation after it has checked for the pushback state. This would be common for HTTP and RPC servers that do admittance control at the beginning of the request processing. If the monstrous allocation would bring the memory heap above the limit, Go should probably panic. Since we don't want to change the language to add memory allocation error checks, I think this is fine. And we have no other option :).

Another problem is that deciding what is the right time to pushback can be hard. Whatever the runtime implements, some folks may find it too aggressive (pushing back too much, leading to poor resource utilization) or too conservative (pushing back too late, leading to high latency due to excessive GC). I guess the go team could provide a knob similar to GOGC to control the pushbackiness of the runtime, if folks are really paranoid about it.

RLH commented 7 years ago

The runtime could set up a channel and send a message whenever it completes a GC. The application could have a heap monitor goroutine (HMG) watching that channel. Whenever the HMG gets a message it inspects the state of the heap. To determine the size of the heap the HMG would look at the live heap size and GOGC. If need be it could adjust GOGC so that the total heap does not exceed whatever limit the application finds appropriate. If things are going badly for the application the HMG can start applying back pressure to whatever part of the application is causing the increase in heap size. The HMG would be part of the application so a wide variety of application specific strategies could be implemented.

Trying to pick up the pieces after a failure does not seem doable. Likewise deciding what is "close to a failure" is very application specific and a global metric that potentially involves external OS issues such as co-tenancy as well as other issue well beyond the scope of the Go runtime. Decisions and actions need to be made well ahead if one expects them to reliable prevent an OOM.

I believe this is where we were headed in #14162 https://github.com/golang/go/issues/14162 and this is a recap of some of that discussion.

I would be interested in what useful policy could not be implemented using the HMG mechanism and current runtime mechanisms.

On Tue, Aug 23, 2016 at 1:43 AM, Yves Junqueira notifications@github.com wrote:

For Google's internal needs, it would be nice to have the ability to set a limit to the memory usage.

After a limit is set, perhaps the runtime could provide a clear indication that we're under memory pressure and that the application should avoid creating new allocations. Example new runtime APIs that would help:

func InMemoryPushback() bool; or

func RegisterPushbackFunc(func(inPushback bool))

That would provide a clear signal to the application. How exactly that's decided should be an internal implementation decision and not part of the API. An example implementation, to illustrate: if we limit ourselves to the heap size specified by the user, we could trigger GC whenever the used heap is close to the limit. Then we could enter pushback whenever the GC performance (latency or CPU overhead) is outside certain bounds. Apply smoothing as needed.

The approach suggested by this API has limitations.

For example, it's still possible for an application that is behaving well to do one monstrous allocation after it has checked for the pushback state. This would be common for HTTP and RPC servers that do admittance control at the beginning of the request processing. If the monstrous allocation would bring the memory heap above the limit, Go should probably panic. Since we don't want to change the language to add memory allocation error checks, I think this is fine. And we have no other option :).

Another problem is that deciding what is the right time to pushback can be hard. Whatever the runtime implements, some folks may find it too aggressive (pushing back too much, leading to poor resource utilization) or too conservative (pushing back too late, leading to high latency due to excessive GC). I guess the go team could provide a knob similar to GOGC to control the pushbackiness of the runtime, if folks are really paranoid about it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/16843#issuecomment-241632474, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7Wn-x0kWzbQY0w2nI8daJSWBbIHPWHks5qiohsgaJpZM4Jqa25 .

rgooch commented 7 years ago

I previously gave the reasoning why using a channel or a callback to receive memory exceeded events won't work: #14162 That same reasoning applies to a channel whenever a GC run is completed.

To robustly handling exceeding a memory limit the check for the limit has to be part of the allocator, not done after a GC run. This is because you can't afford to wait. If you wait for the next GC run, it may be too late. Consider a single large slice allocation that would put you over the soft limit and would exceed the hard memory limit. You'll get an OOM panic. The same applies to a callback function.

You need to immediately stop the code which is doing the heavy allocating. To do that you need a check in the allocator and you need to send a panic(). It's up to the application to set the soft memory limit at which these optional, catchable panics are sent.

Please, before rehashing old suggestions or coming up with new variants, read through #14162 where I gave the reasoning why a panic and a check in the allocator is needed. Otherwise we keep covering the same old ground.

quentinmit commented 7 years ago

@rgooch If you are allocating giant arrays, you probably know exactly where in your code that is happening, and you can add code there to first check if there is enough memory available. You can even do that using the GC information we're discussing passing down a channel.

I do think there is a race here, but in the opposite case - if code is sitting in a tight loop making many small allocations, your channel read/callback might not run in time to actually trigger a new GC soon enough without OOMing.

rgooch commented 7 years ago

I discussed all this in #14162: you can be reading GOB-encoded data from a network connection. No way to know ahead of time how big it's going to be. Or it can be some other library you don't control where a lot of data are allocated, whether a single huge slice or a lot of small allocations. The point is, you don't know how much will be allocated before you enter the library code and you've got no way to reach in there and stop things if you hit some pre-defined limit. And, as you say, if you're in a loop watching allocations, even if you could stop things, you may not get there in time. Spinning in a loop watching the memory level is grossly expensive. This needs to be tied to the allocator.

RLH commented 7 years ago

This does not propose a callback or channel for delivering a memory exceeded message or a memory almost exceeded message. At that point it is already too late. This proposes a mechanism for providing the application timely information that it can use to avoid the OOM. The application knows how best to predict memory usage and, if need be, throttle its memory usage.

One suggestion was func runtime.ReserveOOMBuffer(size uint64)

The application's heap monitor goroutine, HMG, could initially allocate a large object of the required size and retain a single reference to it. If the HMG using information provided by the runtime determines that the current GOGC and live heap size will not support the application's predicted allocations then it can release that single reference confident that the next GC will recover those spans and make them available. It the HMG wants the GC to happen sooner than currently scheduled then it can lower GOGC using SetGCPercent.

If ReserveOOMBuffer is the API that some Go application needs then this provides it. The intent of this proposal is to provide the application with the information it needs to create the abstractions that best fits its need while minimizing Go's runtime API surface.

On Tue, Aug 23, 2016 at 11:13 AM, rgooch notifications@github.com wrote:

I previously gave the reasoning why using a channel or a callback to receive memory exceeded events won't work: #14162 https://github.com/golang/go/issues/14162 That same reasoning applies to a channel whenever a GC run is completed.

To robustly handling exceeding a memory limit the check for the limit has to be part of the allocator, not done after a GC run. This is because you can't afford to wait. If you wait for the next GC run, it may be too late. Consider a single large slice allocation that would put you over the soft limit and would exceed the hard memory limit. You'll get an OOM panic. The same applies to a callback function.

You need to immediately stop the code which is doing the heavy allocating. To do that you need a check in the allocator and you need to send a panic(). It's up to the application to set the soft memory limit at which these optional, catchable panics are sent.

Please, before rehashing old suggestions or coming up with new variants, read through #14162 https://github.com/golang/go/issues/14162 where I gave the reasoning why a panic and a check in the allocator is needed. Otherwise we keep covering the same old ground.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/16843#issuecomment-241764850, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7Wn4rwLDnFazl8ko7MEgqGqjlHlYJKks5qiw4rgaJpZM4Jqa25 .

dr2chase commented 7 years ago

As I read this, #14162 describes a workload where (analogy follows) sometimes the python attempts to swallow a rhino, and if the attempt is not halted ASAP it is guaranteed to end badly. Is it in fact the case that the rhino will never be successfully swallowed? (I can imagine DOS attacks on servers where this might be the case.)

I think that the periodic notification scheme is intended to deal with a python diet of a large number of smaller prey; if an application has the two constraints of m=memory < M and l=latency < L, and if m is affine in workload W (reasonable assumption) and l is also affine in workload W (semi-reasonable), then simply comparing observed m with limit M and observed l with limit L tells you how much more work can be admitted (W' = W * min(M/m, L/l)), with the usual handwaving around unlucky variations in the input and lag in the measurement. It's possible to adjust GOGC up or down if M/m and L/l are substantially different, so as to maximize the workload within constraints -- this however also requires knowledge of the actual GC overhead imposed on the actual application (supposed to be 25% during GC, but high allocation rates change this). One characteristic of this approach is that a newly started application might not snap online immediately at full load, but would increase its intake as it figured out what load it could handle.

But this is no help for intermittent rhino-swallowing.

jessfraz commented 7 years ago

@bradfitz would you be open to me taking some of the ideas from https://github.com/golang/go/issues/14162 and applying the Go proposal process so it is considered? As long of course the proposed solution doesn't break the API or change the language.

bradfitz commented 7 years ago

As long as the proposal isn't to "make it possible to catch failed memory allocations", which I'm pretty sure everybody agrees isn't going to happen.

But any proposal should address or at least consider the whole range of related issues in this space. (back pressure, runtime & applications being aware of limits & usage levels)

jessfraz commented 7 years ago

I was thinking a couple additions to the runtime package to expose information that might be useful for applications like you said in https://github.com/golang/go/issues/16843#issuecomment-241622292

juliandroid commented 7 years ago

Is there any decision about how this would be properly implemented?

In Perl there is documented a notorious $^M global variable that user code could initialize to some lengthy string, that in case of Out of memory error could be used as an emergency memory pool after die()ing. However I couldn't find a working example and it seems that feature was never implemented.

However it seems logical approach. Since you are most probably in multitenancy environment, sharing memory with other go/non-go programs, so the only buffer that you can rely on is the emergency one allocated by yourself. Using that memory by go runtime in case of low memory and immediately notifying the subscribed process that you are running out of memory seems like a good measure to prevent pure go programs panic.

nictuku commented 7 years ago

My proposal is here: https://docs.google.com/document/d/1zn4f3-XWmoHNj702mCCNvHqaS7p9rzqQGa74uOwOBKM/edit

I hope to have an implementation open sourced soon. I don't know if it could be included in the standard libraries.

I would like to make it as robust as possible, so if you'd like to test it, please drop me an email (see my github profile) and I'll contact you later. Thanks!

rgooch commented 7 years ago

This proposal looks interesting. I made a couple of comments in the document:

Support the pattern of pre-allocating at startup (up to a percentage of the VM/container memory) and never give that memory back to the OS
Have a hard memory limit and push back+GC harder as you get closer to the limit.

CAFxX commented 7 years ago

Added feedback to optionally trigger orderly application shutdown when GC pacing fails to keep memory below the set maximum.

tve commented 7 years ago

I'm dealing with an app that runs out of memory (on a 16GB box) and that eventually lead me here. Some of the notes I took along the way are below, apologies if these fall into a "yeah, we know" category.

On 64-bit linux, I hit the out-of-memory panic in sysMap in mem_linux.go:216, but when I look up the call stack I see it passing through grow in mheap.go:774 and the code leads me to believe that if sysMap had returned an error instead of just panicking then grow could have tried a smaller allocation.
```
fatal error: runtime: out of memory
```

runtime stack: runtime.throw(0x8a2de5, 0x16) /usr/local/go/src/runtime/panic.go:596 +0x95 runtime.sysMap(0xc437a10000, 0x5800000, 0xc420394800, 0xaebef8) /usr/local/go/src/runtime/mem_linux.go:216 +0x1d0 runtime.(mheap).sysAlloc(0xad31a0, 0x5800000, 0x421b81) /usr/local/go/src/runtime/malloc.go:428 +0x374 runtime.(mheap).grow(0xad31a0, 0x2c00, 0x0) /usr/local/go/src/runtime/mheap.go:774 +0x62 runtime.(*mheap).allocSpanLocked(0xad31a0, 0x2c00, 0xaceb30) /usr/local/go/src/runtime/mheap.go:678 +0x44f


- I'm running in a container env where the container has a max memory set and I'm trying to understand what fraction of that can realistically be "in_use". It appears that I have to count for anywhere from 25% to 50% overhead. E.g., if the cgroup has memory=16GB then the actual in-use heap data structures may be in the 8GB..12GB range before I hit the out-of-memory panic. On the one hand, with GC that's perhaps in the reasonable ballpark, on the other hand this does represent $$.
- The amount of "unused heap overhead" seems to be tunable using the GOGC env variable, I didn't see a way to modify this at run-time. For example. while the process is far from its limit using 100% reduces GC overhead, but when it reaches perhaps 60% of its limit I may want to change it to 20% to trade memory vs cpu. In my app I see it going from 1% to 6% of cpu overhead.
- I'm very interested in being able to capture control when the process runs out of memory or is about to. I understand that in the absolute this is a difficult problem, but I'm looking at it from a troubleshooting perspective. I would first use it to output a memory profile or similar information so I can understand how much memory is allocated where, plus some info about GC (e.g. allocated but unused space). It would be OK for this to trigger before absolutely-out-of-memory occurs, e.g. the first time the runtime gets back-pressure from the OS (see first bullet point).
- I do believe that many services can adjust their memory consumption by, broadly speaking, adjusting the concurrency. For example, an HTTP server can adjust the number of requests that are concurrently processed. I believe the runtime.MemStats info is sufficient for this purpose, but it could be enhanced by having some callback mechanism when a threshold is exceeded. E.g., a web server could block processing of new requests when 80% of available memory is used and only resume when it drops below 75%.

Overall I concur with the sentiment that most apps that run out of memory will run out of memory regardless of how fancy a mechanism is added to the current situation. For this reason if I had a vote I would vote for adding some additional simple hooks so one can do some tuning and foremost troubleshoot when an app does run out of memory.

aclements commented 7 years ago

On 64-bit linux, I hit the out-of-memory panic in sysMap in mem_linux.go:216, but when I look up the call stack I see it passing through grow in mheap.go:774 and the code leads me to believe that if sysMap had returned an error instead of just panicking then grow could have tried a smaller allocation.

I'm not sure what you're suggesting, exactly. grow can reduce its request by at most 64 KB, which probably isn't going to help when a multi-gigabyte heap is running out of room.

I'm running in a container env where the container has a max memory set and I'm trying to understand what fraction of that can realistically be "in_use". It appears that I have to count for anywhere from 25% to 50% overhead.

Assuming you mean runtime.MemStats.HeapInUse (and friends), note that this can vary depending on where you are in a GC cycle. Perhaps more interesting is MemStats.NextGC, which tells you what heap size this GC cycle is trying to keep you below. This changes only once per GC cycle.

The amount of "unused heap overhead" seems to be tunable using the GOGC env variable, I didn't see a way to modify this at run-time.

runtime/debug.SetGCPercent lets you change this. Right now this triggers a full STW GC, but in Go 1.9 this operation will let you change GOGC on the fly without triggering a GC (unless you set it low enough that you have to immediately start a GC, of course :)

tve commented 7 years ago

I'm not sure what you're suggesting, exactly. grow can reduce its request by at most 64 KB, which probably isn't going to help when a multi-gigabyte heap is running out of room. Ah, I couldn't tell that, you're right then.

tve commented 7 years ago

My proposal is here: https://docs.google.com/document/d/1zn4f3-XWmoHNj702mCCNvHqaS7p9rzqQGa74uOwOBKM/edit

Nice long proposal write-up :-). I'm trying to understand the tl;dr; ...

The proposal seems to come down do "periodically measure live data size and set GCPercent such that GC is triggered before the desired total heap size is reached". As mentioned in the proposal, this can be done/approximated today in the app itself using runtime.MemStats and debug.SetGCPercent.

As far as I can tell the following changes to the runtime would be desirable to improve this:

ensure that the calls necessary are efficient (some (all?) optimization are in Go1.9 already)
provide a hook so GCPercent can be adjusted after each GC instead of relying on a periodic timer?

As a user I'm still left wondering a bit what a reasonable goal in all of this is. I'm imagining something like "for the vast majority of Go apps the tuning of GCPercent allows 80% of memory to be used for live data with moderate GC overhead and 90% with high to very high GC overhead". Maybe someone in the Go community has informed intuition about specific numbers.

The answer to requests to have some callback or rescue option when memory allocation fails would be that instead GC overhead exceeding N% or GCPercent falling below below M% should be used to trigger said rescue action.

tve commented 7 years ago

I did an experiment to use GCPercent to constrain heap size and while the principle works as expected, it does look sufficient to me. I'm working on an app that digests some giant CSVs where memory consumption is an issue. I'm running with GCPercent=25 to try and contain the memory overhead. I'm running with gctrace=1 and the highest heap size number I see is 797MB:

gc 389 @209.888s 6%: 0.013+888+0.10 ms clock, 0.055+164/183/1068+0.40 ms cpu, 796->797->613 MB, 797 MB goal, 4 P

A little later after some memory has been freed I grab MemStats and get the following HeapXxx stats which show 1.2GB of heap (all gctrace outputs since the above were lower):

Heap stats: sys=1205MB inuse=488MB alloc=438, idle=717, released=0

Data grabbed from top at about that time seems to agree with the heap stats (code/stack size are not significant):

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
17746 tve       20   0 1272812 983920   6952 S 226.2 24.9   7:44.84 csv-digest

I was trying to keep the memory used by my process to 613MB*1.25=767MB using GCPercent but clearly that's not really working. The point here is that tuning GCPercent is not sufficient if there is some hard limit one wants to stay under. (I understand that my 25% goal may very well be unrealistic but I don't think this invalidates the point.)

rsc commented 7 years ago

@aclements, is this a duplicate of another issue? I know you're working on some issue related to app-controlled heap size.

aclements commented 7 years ago

@aclements, is this a duplicate of another issue?

I think this is the only issue. Brad dedup'd various other issues into this one when he opened it.

Re-assigning to @RLH, since he's taking the lead on memory limits.

gopherbot commented 7 years ago

CL https://golang.org/cl/46750 mentions this issue.

gopherbot commented 7 years ago

CL https://golang.org/cl/46751 mentions this issue.

rogpeppe commented 6 years ago

I saw this article recently, which seems relevant to this issue: https://medium.com/samsara-engineering/running-go-on-low-memory-devices-536e1ca2fe8f

juliandroid commented 6 years ago

@rogpeppe We have even more restricted embedded environment - just 128MB on a mips32 board, so that should be kept in mind that not all boards are made equal.

Personally I think the best approach is pre-allocating buffer (configurable, so can be tweaked depending of the installed system memory or nature of work the app is doing) that can be used by golang in case of low memory and triggering a message via channel, so your program can react, by both: reducing the work that is doing, or if the resources are low to terminate the app gracefully.

Monitoring side is "easy" - first time the runtime fails with no enough memory, so the only option is to start using the buffer, instead of crashing the app. There is no need to actively monitoring the memory by querying the OS (unless there is a cheap way to receive push notifications from the OS).

The only penalty I see here is C-bindings that won't benefit from that pre-allocated buffer and probably will crash the entire golang app when low on memory.

rgooch commented 6 years ago

@juliandroid: Please consider the cases where allocations are being performed in code you can't change (Go standard library, other libraries). This code can perform huge allocations. You need to be able to stop that code in its tracks, so sending a notification via a channel won't help. A catchable panic is the only way to deal with this family of problems.

juliandroid commented 6 years ago

The idea of configurable pre-allocating buffer is not only to replace the default allocator (for example call gomalloc() instead of malloc() equivalent, at runtime, where you use the pre-allocated buffer in case that the malloc() fails), but also to have enough memory in case that a big memory allocation fails that cannot be satisfied by the pre-allocated buffer, so you still can run a goroutine that listen on a channel and can reduce the load. Sure, that is a runtime, fine tuning. Only the app can decide how much that buffer should be - 1MB, or 50MB.

rgooch commented 6 years ago

I still don't think this is enough, because there are times when it is hard or impossible to know how much buffer space is needed. For an example of the impossible case: you're receiving GOB data over a network connection. Once you call the decoder, you've lost control; a notification channel is no help. Because you don't know how much data you will receive, you cannot set a safe buffer size. If you make the buffer too small, you'll get an uncatchable panic. If it's too large, you're wasting a lot of memory (and thus operational capacity) in case someone sends you a huge blob.

I have not seen any proposal that can handle this case, except for the "soft" panic that I've suggested. The buffer by itself is insufficient. The notification has to be done via a catchable panic.

juliandroid commented 6 years ago

I don't exclude panic, as last resort (actually it is necessary in all cases), but my point is that golang will be better, if you prevent 99% "out of memory" and leave that 1% for unexpected situations. My focus is managing and health monitoring over how to terminate the app. And that is just a one concern. Another, related is "killing a goroutine" which is also a complex task, there are also some other issues at runtime, so if some of those are addressed together in opposite of looking at them separately then golang might become quite safe and predictable. And the complete fix looks so Go 2.x, since I'm not sure how this will stay without any braking changes. It the idea is just a quick and dirty fix, I don't see how the solution will be sustainable and complete.

twotwotwo commented 6 years ago

For what it's worth, someone at CloudFlare recently posted about GOGC=100 with a small live heap mucking up their benchmarking. That's a pessimal case (tiny live data, CPU bound), but it seems like one where it would be much more convenient to be able to tell the Go process e.g. "I have 1 GB of RAM for you, feel free to use it," than try different GOGCs. That sounds like the SetSoftMemoryLimit idea rgooch mentioned.

A single benchmark isn't that interesting, but anything that spends substantial time in GC while there's a lot of free RAM may be making the wrong tradeoff from the user/operator's perspective, and correcting it with GOGC can be laborious and risky. It's understandable the default doesn't always do just what the user wants since the runtime can't read minds, but it would be great if when I do know I could just say GOGC=28GB or something rather than guess, do hacks with MemStats and timers, etc.

(You can implement a not-very-good fake of GOGC=28GB now with a package that checks the env var and does the timer/MemStats/SetGCPercent thing. The runtime currently ignores the invalid GOGC, it looks like. You could do a better fake with @RLH's GC notification channel idea. Obviously something like placing a hard limit on Sys that correctly handles surprise giant allocations would seem to involve the runtime more deeply.)

nictuku commented 6 years ago

I left Google recently, but before I did I worked on a solution that used SetGCPercent to try to pin the server to a certain memory limit. It was linked above, but it's [1].

We spent quite some time tuning it and even launched it on one large-scale production system to validate it with real traffic and not just synthetic load. It sort of worked. We were able to stay below the target memory usage. But CPU usage went through the roof so we rolled it back and threw it away.

The takeaway was that an application-level solution has too much overhead to be useful. We have to monitor the memory usage and call SetGCPercent whenever it changes substantially. Because memory usage can growth suddenly due to, say, one expensive request, we need to essentially monitor the memory usage with a high frequency. Doing it less than 10x per second was kinda useless. But there's just no way to do it efficiently enough.

I tried different methods such as polling the Go stats or setting up an eventfd notification for memory changes to the container[2]. All methods were too indirect and expensive.

I don't recall the details, but I believe that one of collecting heap stats or calling SetGCPercent would also trigger a GC - which helped keep memory usage down but added to the immense overhead. We tried minimizing these calls, but it just wasn't good enough.

I think the runtime needs to be in the loop for this to work - it should not grow the heap size beyond a certain point unless it's GC threshing.

We can't really solve this using application code, sadly.

[1] https://docs.google.com/document/d/1zn4f3-XWmoHNj702mCCNvHqaS7p9rzqQGa74uOwOBKM/edit [2] https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt

twotwotwo commented 6 years ago

@nictuku Very helpful; I'd looked at that doc but didn't catch the details of when it was tried for real. Reading your list of problems, it looks like they might have even partly motivated 1.9's changes to make SetGCPercent/FreeOSMemory not STW, make SetGCPercent not always start a concurrent GC, and get MemStats under 100µs on big heaps. Makes me interested in trying to reimplement your approach and see how it works today. (Still agree the runtime will always be able to a better job of this than app code.)

Also curious about the status of the SetMaxHeap experiment by @aclements in https://go-review.googlesource.com/c/go/+/46751. I see it got +2'd by @RLH but not sure that means anything for next steps. Is some API like that planned to go in eventually or at least still under consideration? If so, is there anywhere else we should look for discussion about it?

nictuku commented 6 years ago

@twotwotwo I tried my experiment with at least one of the changes from @aclements backported into 1.8 - the one speeding up getMemStats, IIRC. It didn't really help that much :-(. The naive implementation has too many problems and getMemStats was just one of them.

https://golang.org/cl/46751 looks promising.

rgooch commented 6 years ago

So, looking at https://golang.org/cl/46751 there is still the problem of how to induce a panic for goroutines which have opted-in. If you have code which is stuck in some library code performing huge allocations, you need a way to induce a panic so that the allocations are stopped. @aclements: will your solution be including that feature as well?

RLH commented 6 years ago

The idea is to use contexts for this type of cancellation functionality.

https://golang.org/pkg/context/

On Tue, Nov 21, 2017 at 11:28 AM, rgooch notifications@github.com wrote:

So, looking at https://golang.org/cl/46751 there is still the problem of how to induce a panic for goroutines which have opted-in. If you have code which is stuck in some library code performing huge allocations, you need a way to induce a panic so that the allocations are stopped. @aclements https://github.com/aclements: will your solution be including that feature as well?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/16843#issuecomment-346081964, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7Wn7I_G9soyOShpAp3f7HtpNj3MrR_ks5s4vowgaJpZM4Jqa25 .

beorn7 commented 6 years ago

Just for the record, as Prometheus was mentioned early in the whole story as a signature use case: If you look at the current Prometheus code (v2.x), you'll find that that's not the case anymore, as most of the memory used is in mmap'd files. The RSS of Prometheus 2.x is tiny compared to Prometheus 1.x. However, I do believe that using mmap in Go programs (and the subsequent management of raw data blocks of memory) will and should be limited to very specific scenarios and is certainly not a viable work-around in general. So please do keep up the good work here!

If you are interested how the problem was “solved” in the later 1.x Prometheus versions (calling ReadMemStats once per second), here are the relevant code references (including the sometimes desperate comments of the poor coder):

As you can see, this grew into something fairly involved, which is, however, still not able to make absolutely sure we won't let the heap grow too much. On the other hand, the RSS is anyway not closely correlated to the heap size (and, for some reason I don't know, increased slightly for the same heap sizes with Go1.9). In practice, this worked quite nicely. On our fairly large number of Prometheus servers at SoundCloud (~70 servers), we never had an OOM-kill again, until we compiled Prometheus with Go1.9 but kept the settings the same (and the ratio between RSS and heap size went up).

rgooch commented 6 years ago

That requires that all my transitive dependencies support contexts. That does not seem likely to happen, or will take a loooooong time.

aclements commented 6 years ago

Also curious about the status of the SetMaxHeap experiment by @aclements in https://go-review.googlesource.com/c/go/+/46751. I see it got +2'd by @RLH but not sure that means anything for next steps. Is some API like that planned to go in eventually or at least still under consideration? If so, is there anywhere else we should look for discussion about it?

The intent is to get some experience with that API and make sure it actually solves problems (and doesn't create new ones :). We're planning to roll it out as an experimental API within Google and I was also hoping to get some adventurous open source users to try it out (I should email golang-dev), but neither of these has happened yet.

So, looking at https://golang.org/cl/46751 there is still the problem of how to induce a panic for goroutines which have opted-in. ... @aclements: will your solution be including that feature as well?

Sorry, but no, that isn't part of my solution. As @RLH said, context cancellation is the "right" answer to this, though I understand that context isn't everywhere. I'm afraid I still don't really know what a panic-based solution would look like. What actually triggers the panics? Which goroutines actually get hit by the panic? A large part of the point of CL 46751 is that the back-pressure is gradual, graceful, and application-level, so the application can respond as a whole before things go too terribly wrong. And it's application level because the heap is application level. We don't have the ability to say "Goroutine X is using Y MB of memory" because it's not well-formed in general (how do you count things reachable from multiple goroutines?). "Y MB could be freed if goroutine X exited" is well-formed and could theoretically be useful for this, but I'm pretty sure it's very expensive to figure out the answer to that.

(and the ratio between RSS and heap size went up).

@beorn7, out of curiosity, what sort of ratio are you seeing in practice? In general it's hard to bound this, so I expect you'll always have to do some testing to establish this and it will change a bit between releases.

rgooch commented 6 years ago

To use an enlightened quote: "the perfect is the enemy of the good". Whether contexts are the "right" solution is unclear, but it is clear that it will be a long time before they can help solve the problem generally. In the meantime, people have to deal with OOM panics.

Here is an approach that may work while preserving the solution you've implemented: add an API that allows a goroutine to receive an externally induced panic. That would then allow me to catch the event from your memory pressure channel and start sending events to goroutines to initiate panics. Ideally, if the screams from the garbage collector get louder, I'd start inducing panics to more and more goroutines, to bring the situation under control. This follows the basic "opt-in" philosophy that I've been advocating. I know which goroutines are vulnerable to triggering an OOM, so those are the ones I opt-in to being killed.

Suggested API: func MakePanicChannel() chan <- error

When an error is sent on the channel, the calling goroutine will panic, with the provided error.

beorn7 commented 6 years ago

@beorn7, out of curiosity, what sort of ratio are you seeing in practice? In general it's hard to bound this, so I expect you'll always have to do some testing to establish this and it will change a bit between releases.

Yes, totally aware of that. I didn't want to imply a need to clamp RSS (which would be close to impossible) but merely underline that clamping the heap size doesn't have to be perfect rocket science to have the effects desired in many scenarios.

To answer your question: Our rule of thumb for a reasonably safe heap size setting was 67% of available physical memory with Go1.8 compiled Prometheus 1.7, and 60% for Go1.9 compiled Prometheus 1.8. (Note the beautiful version number dance…)

RLH commented 6 years ago

An API that allows a goroutine to externally induced panic in another goroutine means the condemned goroutine must reason about whether an asynchronously induced panic will leave the system in an inconsistent state. Even a locked critical section would need to reason about consistency in the face of an asynchronous panic. Debugging asynchronous issues and writing test cases would be a real challenge. The experience with Java and Thread.Stop ended up being deprecated for these and other reasons. Doesn't MakePanicChannel have the same set of problems?

On Tue, Nov 21, 2017 at 12:49 PM, Björn Rabenstein <notifications@github.com

wrote:

@beorn7 https://github.com/beorn7, out of curiosity, what sort of ratio are you seeing in practice? In general it's hard to bound this, so I expect you'll always have to do some testing to establish this and it will change a bit between releases.

Yes, totally aware of that. I didn't want to imply a need to clamp RSS (which would be close to impossible) but merely underline that clamping the heap size doesn't have to be perfect rocket science to have the effects desired in many scenarios.

To answer your question: Our rule of thumb for a reasonably safe heap size setting was 67% of available physical memory with Go1.8 compiled Prometheus 1.7, and 60% for Go1.9 compiled Prometheus 1.8. (Note the beautiful version number dance…)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/16843#issuecomment-346105972, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7Wn0lVGs4srb_YZEmYvZfEtWZlnlGtks5s4w0hgaJpZM4Jqa25 .

rgooch commented 6 years ago

Firstly, this is an opt-in mechanism. The goroutine must call MakePanicChannel and it must register that channel with whomever it wishes to give panic powers. Secondly, people should be using defer to manage their locks, which mitigates a lot of the problems with panic-as-abort. For the class of problems I've discussed up-thread, this approach will work well.

RLH commented 6 years ago

My concern isn't about releasing a held lock using a defer. Asynchronous externally induced control flow between any statement in the critical section and the defer logic is a foot gun. Critical sections exist to provide consistency and isolation, the C and I in ACID. Providing and testing these properties in face of such control flow would be a challenge and likely be error prone.

On Wed, Nov 22, 2017 at 11:53 AM, rgooch notifications@github.com wrote:

Firstly, this is an opt-in mechanism. The goroutine must call MakePanicChannel and it must register that channel with whomever it wishes to give panic powers. Secondly, people should be using defer to manage their locks, which mitigates a lot of the problems with panic-as-abort. For the class of problems I've discussed up-thread, this approach will work well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/16843#issuecomment-346410239, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7Wn5WCM-nwQTaifAYYpgMYOt4Myw7Hks5s5FGUgaJpZM4Jqa25 .

rgooch commented 6 years ago

How else would you recover from a too-large memory allocation (and unwind the calling stack) which is buried deep?

RLH commented 6 years ago

Neither the literature, languages similar to Go, nor this thread provide a satisfactory answer of how to recover from an OOM. This issue and the proposed solution provides a mechanism for monitoring heap size and the tools needed to avoid an OOM.

On Mon, Nov 27, 2017 at 5:16 PM, rgooch notifications@github.com wrote:

How else would you recover from a too-large memory allocation (and unwind the calling stack) which is buried deep?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/16843#issuecomment-347347051, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7Wn1zyEBCI65-BVg9QW2BvLUHBXDZAks5s6zS3gaJpZM4Jqa25 .

robaho commented 5 years ago

In the meantime, something like Java's 'dump heap on OOM' would be very helpful, as long as there is a heap analyzer tool - but I assume it could just dump it in the memprof format and that should suffice.

golang / go

runtime: mechanism for monitoring heap size #16843