Contribution idea: Add memory based load shedding

gabikliot commented 9 years ago

Silo currently supports load shedding based on CPU. If CPU gets above a certain limit (configurable) and load shedding is enabled, the silo start rejecting new client requests (grain to grain requests as well as system messages are not rejected), until CPU gets back to normal. Client grain calls will get broken with an explicit Orleans load shedding exception.

We would like to also support memory based load shedding. At the first step this can be as simple as if the process uses X% (configurable) of the total physical memory on this machine, start load shedding. And of course we want to allow shedding on both CPU and memory. Later extension could go into smarter memory pressure detection, such as based on gen2 size or % of time in GC.

BrianSakhai commented 9 years ago

@gabikliot is there a way for grains currently in memory to know if the silo they are in is under memory pressure?

Grains that are just caching data could tap into this and unload state or call DeactivateOnIdle.

gabikliot commented 9 years ago

That's a good point. No, not currently. The grain can of course check the memory perf counter itself, but it probably would be more useful if the framework provided this information in a unified way. Something like: the grain can ask (via Grain base class) to get "current runtime status" which will include a bag of counter values. An extension could be to allow grains to subscribe to "runtime status events" and raise the overload event when it is detected. It is just important to keep all those extensions to a separate advanced usage pattern. By default we don't want simple grains to need to deal with all this advanced complexity.

sergeybykov commented 9 years ago

Maybe the title is misleading. I think we should be talking about "actor shedding" rather than "load shedding" here. Even if we start rejecting all incoming requests, we will not immediately reduce memory pressure by much. If instead we deactivate a set of least recently used grain activations, that will free up some memory right away.

BrianSakhai commented 9 years ago

@sergeybykov if there were also a way to prefer certain types be deactivated (through configuration or attributes) in the memory pressure situation then we could have a smarter system. Also, we may want to be able to exempt some grains from being deactivated in response to memory pressure.

sergeybykov commented 9 years ago

@BrianSakhai All good points, all doable.

veikkoeeva commented 9 years ago

Hmm, I wonder if this can be generalized so that one can either provide a function to be called back or called by timer. Then this function could do what the progammer has programmer it to do, for instance to check system statistics and then "control the system" by further managing some aspects of it. As an example, ask grains with some freely specified criteria (IEnumerable, LINQ) one which could be least recently used combined with other appropriate filters and then ask the system to deactivate them. Sounds a bit like dependency injection in that case too. :)

gabikliot commented 9 years ago

@veikkoeeva , we already have a way to plug in such a mechanism. It is called placement directors. They are currently not dynamically injectable, one has to write them inside the runtime, but that can be extended. Placement Director can run a timer, it has stats from all silos, and it can be also extended to poke individual grains. For example, look at a relatively complex one: https://github.com/dotnet/orleans/blob/master/src/OrleansRuntime/Placement/ActivationCountPlacementDirector.cs or all the others here: https://github.com/dotnet/orleans/tree/master/src/OrleansRuntime/Placement.

ElanHasson commented 2 years ago

This would complement #7526, which can disable CpuLimit when Cpu stats are unavailable.

bill-poole commented 2 years ago

The .NET TlsOverPerCoreLockedStacksArrayPool<T> implementation of ArrayPool<T> trims memory on a GC gen 2 event when under high memory pressure. The gen 2 GC callback is generated using the Gen2GcCallback class.

The callback invokes the TlsOverPerCoreLockedStacksArrayPool<T>.Trim() method, which detects whether the process is under high memory pressure and if so, clears the buckets in the array pool.

Orleans should be able to use a similar approach, taking inspiration from the .NET runtime code.

I believe it makes sense for grains to be shed when the process is under high memory pressure based on an LRU policy by default.

JorgeCandeias commented 2 years ago

I believe it makes sense for grains to be shed when the process is under high memory pressure based on an LRU policy by default.

This one is a double-edged sword. I also believe it would work wonders for a memory-based auto-scaled deployment but such a thing can and will kill a cluster without it.

Long story:

We had the same idea at my place, many moons ago. If dotnet can garbage collect itself, why can't Orleans? So I went and designed a fancy resource manager that would deactivate self-enlisted grains upon memory pressure (or even other reasons, it was an open policy design). Had some LRU magic and everything. And it work as intended, believe it or not. The end result? A complete disaster. We ended up canning the whole thing, stopped being penny pinchers, and got some more of those beautiful memory sticks on the servers. Why? We were solving the wrong problem.

What we found, and this was a Homer Simpson "doh" moment for us, is that if a grain is in memory, and hasn't either been explicitly deactivated by code or after some short configured collection age, there's a very good reason why it is staying in memory. The reason is that grain is receiving requests and has to fulfill them. And here lies the problem. If we deactivate an in-use grain just because the host is running out of memory, then the next request will just activate it again somewhere, either on the same box or on some other box, that is probably also running out of memory. So the grain will just get deactivated again. And activated again. And so on and on.

So the result of this cycle was tremendous activation churn, which overwhelmed the storage layer and made the system unusable, not much different from when the cluster runs out of memory in the first place and comes crashing down. So a lot of code and effort for the same end result.

The correct problem to solve was lack of memory. RAM is the core resource of Orleans, and if we run out of that, the system won't work. In fact, the entire cluster can come down due to a snowballing effect, either due to a despairing GC, OS memory swapping, or the OS killing processes outright for running out of virtual memory, all while the users keep putting the same load on the cluster - which it can't fulfill anyway.

The disaster story above said, I still believe that an LRU based deactivation policy would help balance out the cluster, provided folk have deployed said cluster in some form of auto-scaling configuration triggered by memory consumption. If that's not the case, then I believe developer time is better spent designing some form of activation throttling feature based on memory usage, so we don't let users put more load on the cluster than we know it can take.

bill-poole commented 2 years ago

Thanks @JorgeCandeias for the your very detailed explanation! I'm about to embark upon a similar journey, so having the opportunity to learn from your experience is very much appreciated! I found what you said very insightful and I agree with everything you said.

In my mind, the benefit of pre-emptively deactivating grains based on memory pressure is to effectively have a dynamic "collection age" (either globally for all grain types, or for specific grain types) that responds to memory pressure, rather than a conservative fixed "collection age" that ensures idle grains are collected before there is any significant memory pressure - but potentially deactivates grains prematurely.

i.e., I would see this kind of feature as supplementing the "collection age" configuration for a grain type, which allows us to be more liberal with the configured "collection age".

Loading grains from storage has both a performance and cost impact. Therefore, there is benefit in keeping idle grains in memory while there is still a chance they may be used in the future. We can only do this to the maximum utility of the available memory if grains can be configured to be deactivated in response to memory pressure.

bill-poole commented 2 years ago

@JorgeCandeias, have you deployed an Orleans cluster that auto-scales in response to memory pressure? I would have thought the way that the .NET server GC works (i.e., collecting less regularly and not readily releasing memory to the OS), it would be difficult for the auto-scaler to detect when there is real memory pressure.

dotnet / orleans

Contribution idea: Add memory based load shedding #370