dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

Custom allocators - (size, disposal, pools, etc). #4368

Closed ayende closed 3 years ago

ayende commented 9 years ago

One of the hardest things that we have to handle when writing server application or system software in .NET is the fact that we don't have good control over memory. This range from simple inability to state "how big is this thing" to controlling how much memory we'll use for certain operations.

In my case, working on RavenDB, there are a lot of operations that require extensive memory usage, over which we have little control. The user can specify any indexing function they want, and we'll have to respect that. When doing heavy indexing, that can cause several issues. In particular, it means that we generate a LOT of relatively short term data, during which other operations also run. Because we are system software, we are doing a lot of I/O, which require pinning memory.

The end result is that we may have memory with the following layout.

[useful objects] [ indexing garbage ] [pinned buffers doing i/o] [ indexing garbage] [ pinned buffers ]

That result in high memory fragmentation (on Gen0, mostly), which is hard to deal with.

It also means that when indexing is done, we have to cleanup quite a lot of garbage, and because the indexing garbage is mixed with stuff that is used right now, it either cannot be readily allocated or require a big compaction.

It would be great if we had greater control over memory usage. Being able to define a heap and instruct the CLR to allocate objects from it would be wonderful. We won't have request processing memory intermixed with background ops memory, and we have a good way to free a lot of memory all at once.

One option would be to do something like this:

using(var heap = Heap.Create(HeapOptions.None, 
    1024 * 1024, // initial size
    512 * 1024 * 1024)) // max size
{

  using(heap.AllocateFromMe())
  {
     var sb = new StringBuilder();
     for(var i = 0; i < 100; i ++ )
           sb.AppendLine(i);
     Console.WriteLine(sb.ToString());
  }
}

This will ensure that all allocations inside the scope are allocated on the new heap. The GC isn't involved in collecting items from this heap at all, it is the responsibility of the user to take care of that, either by explicitly freeing objects (rare) or by disposing the heap.

Usage of references to the heap after it is destroyed won't be allowed.

Alternatively, because that might be too hard / complex. Just having a way to do something like:

 heap.Allocate<MyItem>();

Would be great. Note that we can do the same right now by allocating native memory and using unsafe code to get a struct pointer back. This works, but very common types like arrays or strings cannot be allocated in this manner.

Having support for explicit usage like this would greatly alleviate the kind of gymnastics that we have to go through to manage memory.

masonwheeler commented 6 years ago

@mattwarren You wouldn't happen to know anything about the progress of Project Snowflake?

denisvlah commented 6 years ago

I didn't found any public announcement regarding the Project Snowflake since the white paper was released. If anyone can point me to the page where I ca put like to move project forward I would be very happy and even ask my friends to do the same.

Thanks.

jkotas commented 6 years ago

@mjp41 @dimitriv Anything you can share about progress of Project Snowflake?

mjp41 commented 6 years ago

@masonwheeler , @denisvlah thanks for the interest in Project Snowflake. We are still working on the project improving the design (hopefully will write another paper/blog post soon). Our main fcous now is finding real world .NET workloads that benefit from this design point, but don't have significant evidence yet.

ayende commented 6 years ago

@mjp41 If you are looking for something where it would be useful, RavenDB has several such cases. We index a lot of data, and it would be really useful to have a scope for the index and recycle the entire thing in one shot.

This is a wishlist from 2013: https://ayende.com/blog/161889/my-passover-project-introducing-rattlesnake-clr

masonwheeler commented 6 years ago

@mjp41

Our main focus now is finding real world .NET workloads that benefit from this design point, but don't have significant evidence yet.

Wanna find plenty of them really quickly? Put up Snowflake on its own repo, with clear notes that this is an alpha release that's not production-ready yet, and let Linus's Law do the hard work for you. You know there are plenty of devs out there who would love to be able to use manual memory management on various places in their CLR projects. You'll get far more feedback (and some better feedback, if you don't mind digging through a bunch of crap) from the community than you ever will from a closed research group.

mjp41 commented 6 years ago

@ayende thanks for the link. At least based on our implementation, jemalloc cannot compete with Gen0 collections, so what ever we migrate really needs to be hitting Gen2 to be beneficial. So this generally works well for periodically dumped logs, and data that have been cached. I imagine there are cases where this occurs in RavenDB.

@masonwheeler we are considering this.

ayende commented 6 years ago

@mjp41 The mere fact that I can control the lifetime of critical objects, is huge for us.

clrjunkie commented 6 years ago

I think the primary reason one would care for manual memory management in .NET is to avoid high GC pauses in processes that hold large tree structures which I believe is “the scenario”. I would also consider: A trimmed down C# dialect – “Unmanaged C#” (think C but C# syntax, having “free” and no lib’s) having such code packed in a special purpose assembly to be accessed via a much simplified P/Invoke style mechanism.

ayende commented 6 years ago

From my point of view, having a few hotspots (typically very small pieces of the code) where I can manually control things means huge perf wins without having to deal with unmanaged everywhere.

jlennox commented 6 years ago

This story comes to mind https://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector

4creators commented 6 years ago

Actually, it seems that a lot of media applications will hugely benefit from using manual memory management and in particular guarantee for no stop the world events during processing on time critical threads. I have not dug into Project Snowflake deep enough to check if this is possible, however, if it would be achievable than the whole world of real-time media processing will open up for C#.

4creators commented 6 years ago

Ahh, the very same features would be of great value to game developers.

masonwheeler commented 6 years ago

Agreeing with @ayende and @4creators here. Having specific things that could be isolated from the GC would be a massive advantage for the game engine I'm working on.

mjp41 commented 6 years ago

@clrjunkie

I think the primary reason one would care for manual memory management in .NET is to avoid high GC pauses in processes that hold large tree structures which I believe is “the scenario”.

Is precisely the kind of scenario where we see benefit.

@ayende we have very much focussed on the minimal code changes to use the API. Very much adopt in small places. Most of the time the GC perform really well, and massively improves productivity, so we want that to be the default.

@4creators with Project Snowflake, we don't have any mechanism to stop the GC, we use both the standard (server/workstation) GC plus our runtime extensions. There is already GC.TryStartNoGCRegion.

verelpode commented 6 years ago

@ayende wrote:

Note that what I would really like is to define a custom heap, and just drop the whole thing in one shot.

@masonwheeler wrote:

... or it gets dropped when you call the .Drop() method, in which case you've just introduced the concept of dangling references into what used to be a memory-safe environment.

What if the code runs in a separate AppDomain? However, currently GC is done per-process not per-AppDomain, so this idea would require an option that enables separate GC or heap for an AppDomain. Ideally a lightweight kind of AppDomain that can be frequently created and unloaded/dropped. This would be safe because the objects in one AppDomain cannot contain references to objects in a different AppDomain. Such an AppDomain could also have an option to entirely disable GC in the AppDomain, meaning all objects in the AppDomain remain alive until the AppDomain is unloaded/dropped.

Alternatively, my experimental benchmark in corefxlab/#2417 recorded 3x to 4x faster performance when using references to structs stored in arrays instead of class instances. There I describe an ability to make a field in a normal struct or class that safely points to a struct in an element of an array.

See also the message where I describe an idea where a class could have an attribute applied that says that the class should use automatic-reference-counting instead of the normal CLR GC.

In the same message, I also describe an idea for a read-only array of class instances where each element of the array is immediately non-null and cannot be changed to any other reference, and the array is GC'd as a single object not individual class instances.

svick commented 6 years ago

@verelpode How does that make sense in the context of .Net Core, which, as I understand it, doesn't have AppDomains?

verelpode commented 6 years ago

@svick -- Good question. I didn't know that (no time to read everything), but here is my suggested solution: Slightly change the way we think of this idea. Instead of calling it "AppDomain", let's call this idea something else, such as "Memory Domain" or "GC Domain". It may make good sense to rename this idea to "Memory Domain" because the idea that I described is indeed not the same as the existing AppDomain feature, rather it has similarity and difference. So let's say that a "Memory Domain" would be similar to an AppDomain except that a "Memory Domain" would have a separate heap/allocator/GC, whereas AppDomains use the per-process GC and LOH. The part where "Memory Domains" and AppDomains share the same idea is that the objects in one domain cannot contain references to objects in another domain, thereby solving the problem that @masonwheeler mentioned.

I like the name "Memory Domain" better than "GC Domain" because GC might be entirely disabled within a "Memory Domain". Maybe GC is always disabled inside a "Memory Domain", or maybe it is optionally disabled. @ayende would like no GC in a "Memory Domain", instead he would like to drop the entire "Memory Domain" when he's finished using it, and I desire this also. A "Memory Domain" itself would probably be garbage-collected via a finalizer outside of the Memory Domain, but no GC inside the domain.

Re multi-threading, ideally a "Memory Domain" would not require use of threads, but would be thread-safe to support the cases where multi-threading is desired.

Does that sound good to you?

ayende commented 6 years ago

Something like that can be pretty nice, yes. Even just being able to have separate GC for parts of the app would be great. My user facing code is using a dedicated thread / heap that isn't going to freeze because of long GC cycle on the backend code.

verelpode commented 6 years ago

@ayende -- I agree, even if GC cannot be disabled in the domain, the separated GC would still be helpful. However, ideally, I'd like to disable GC inside the domain because, for example, when I ran my benchmark over in #2417, I observed that GC collections wasted a lot of processing time because GC collections ran 996 times during the time when I only needed ONE garbage collection to run (at the end). The other 995 collections were unnecessary and wasted processing time, making my program run slower. (And System.GC.TryStartNoGCRegion doesn't work because it is limited to ephemeral segment size, and because we shouldn't stop GC for the entire process including every thread.)

Large workloads would benefit from a fine-grained ability to disable (or manually start+stop) GC for all objects in a "domain", and it's simpler and faster to discard an entire domain rather than determine which individual objects inside it can be garbage-collected.

ayende commented 6 years ago

I love this idea much better than https://github.com/dotnet/corefx/issues/31643 It works very clearly with existing infrastructure and concepts. We don't need to deal with the idea of cross domain concepts without a proxy, for example.

verelpode commented 6 years ago

Can you clarify/elaborate on the proxy topic? How would you like to communicate with (or control) the object(s) in the other "Memory Domain"? With a proxy like in AppDomains or serialization+unserialization or a different way?

ayende commented 6 years ago

@verelpode I would expect to work with them via proxies, like we used to have with app domains. In that way, you have explicit separation between what memory reside in what domain. You can also do cross references between the domains, but only with explicit proxies and not sharing of objects.

verelpode commented 6 years ago

I would love to have the "Memory Domains" feature (including safe proxies similar to AppDomains), but my idea needs input/critique from CLR experts and/or MS engineers. The CLR internals is not my area of expertise, thus I can't say exactly how it would be implemented internally.

juepiezhongren commented 5 years ago

@ayende https://github.com/apple/swift/blob/master/docs/OwnershipManifesto.md .net must consider this

akutruff commented 5 years ago

This all comes down to, some first class support of object pooling. The strongest evidence: Roslyn, the C# compiler itself, had to end up implementing multiple object pools to achieve performance goals. See here:

Pools from Roslyn source

ASP.NET Core

Discussion

In today's world of increasing concurrency, shared objects that are treated as readonly/immutable during parallel processing always need deterministic cleaned up when the last task completes.

Arrays/Buffers - We end up pooling these every single time.

POCO's - Small message-like object that hover dangerously in Gen1 are used over and over again, yet are treated no differently by the GC. (POCOS need to be reference types for polymorphism/pattern matching without boxing when they are put in a queue.) Readonly structs, ref returns, Span, and stackalloc are great steps for processing on the stack, but do not address the inevitable need to call a form of .ReturnToPool(). Things will need to get buffered and will end up stored off the stack. It's unavoidable in queued scenarios. Value types are not your friends here either as you're going to be boxing and unboxing like crazy. The actor model is alive and well and happening more and more with pattern matching and increased parallelism.

There needs to be some way to achieve pause-free memory control in our ecosystem that involves some first class support of reference counting as well as custom allocators. It's not just one feature or keyword to solve this. Further, let's tell the GC to treat certain objects as memory critical and that they must be treated differently. This includes banning certain object instances from entering phase 1/2/LOH during garbage collection. (We would be able to make this a type-based policy, but let us make it only for a subset of instances as well.) Even defining our own "phase" with a GC-as-a-service paradigm would be lovely. In other words, let's write code to help the GC and not replace/fight it. This is not a place for being declarative.

Destructor-like behavior - (not finalizers as we need to access managed memory) We need tight deterministic cleanup if we're working with pools and we need to be able to set a guarantee on when they run.

GC Policy - Set a GC config on an object that says: "Do not promote this object to Gen 2 ever. Run a delegate/destructor as either a callback or Task/ValueTask on the ThreadPool, or on a thread reserved by the application. Do not pause the world for this object. It's the appliation's job to return it to a pool. The pool is marked as not to be compacted, and do not put it in the LOH. This will not be solved by new keywords similar to using() blocks. It likely won't be able to be declarative like other solutions.

MemoryPool<T> - Doesn't get the job done unfortunately. IMemoryOwner<T> is a reference type. The owner objects themselves either need to be pooled if we have frequent acquire and release of our objects, and we have to roll your own reference counting on top of it. Looking at the implementations, the best shot is a heuristic to avoid CPU cache thrashing with thread local storage that ends up in local cache starvation in a pipelined producer/consumer scenarios. We can try to wrap MemoryOwner in a value type/ struct to avoid further allocations, yet you end up treating that as a mutable handle. (Mutable value types are evil, yet looking at pooling implementations above... you see handles that are stateful structs with comments warning you.)

When the the ever looming day comes that we hit a pause from Gen 2, there is absolutely no good solution to this problem given the current run-time or language support. WeakReference does not get it done. ConditionalWeakTable still needs our own form of dirty GC or pooling of the WeakReferences themselves because as you add WeakReferences, you end up with a ton of them in the finalizer queue.

Snowflake - This is a great step in the right direction. Obviously smart people are making strides, and there's greatness there. There is one absolutely huge issue that kicks the can too far down the road:

Finally, we have also built shareable reference counted objects, RefCount, but we are considering API extensions in this space as important future work.

Reference counting needs to be solved at the same time to support queuing scenarios and immediate release of scarce buffers. What we have right now with pooled objects on the stack is at least manageable. It all really breaks when we go to shared pointers. Having an immutable, pooled, object in a logging queue and a network queue immediately sends us back to square one. There is an argument to be made for incremental improvement and doing deterministic cleanup later. For such a fundamental change to memory management, leaving clean reference counting as a TODO has not historically worked out well.

For writing pools, and factories we really need to have support for treating constructors as general delegates. We end up needing and we do use this pattern every day: list.Select(x=> new Foo(x)), Factory.Create(x => new Foo(x)) or we learn the hard way that the new() generic constraint used Activator.CreateInstance, and you can't use any constructor arguments. I wish I could do this: Factory.Create(Foo.constructor) and the constructor is converted to an open delegate. Most importantly though, you end up having to make pooled instances have an Initialize(x, y) function when they are getting recycled. Otherwise, they have no way to be stateful. Let me call a constructor on an existing object as many times as I like in the same memory location without invalidating references to that object. (foo.constructor(x)) Last I checked, we can hack and do this through IL if we wanted to. (The memory layout is deterministic after all, right?)

Lastly, over almost 16 years of working in .NET, every single project has had to have an object pool at some point. It's no longer premature optimization, but an inevitability. In financial software, you end up with a gazillion tiny objects for orders, and quotes that love ending up in gen 2. For video, audio, and big unmanaged resources, you will end up pooling and it's going to be after you've written a lot of great code that's going to result in value types turning into classes making things worse. (But hey, it's not a buffer!) For gaming, you better hope that your unpredictable GC pause finishes while also having time to do your physics processing as well. (You're just going to drop frames, because you only have 15ms that you're already squeezing as much work into as you can.)

For C# language feature discussions - "The runtime doesn't support IL for ___" seems to come up in discussion in this area. It's why I wrote it all here. It's too big to not bring it all together as that's how we write programs: run-time and language.

I can't express enough how much fixing this area will benefit the community. It's been my priority 0 since 2008.

Edit: Tried opening a new issue specifically focused on on object pooling and it got closed. Will leave this here, but that doesn't bode well for this ever being looked at holistically in a public fashion.

Peperud commented 5 years ago

This all comes down to, some first class support of object pooling. The strongest evidence: Roslyn, the C# compiler itself, had to end up implementing multiple object pools to achieve performance goals....

Amen!

jkotas commented 3 years ago

Open ended discussion. Closing due to no activity.