More efficient MemoryAllocator

antonfirsov commented 3 years ago

One way to address the concerns in #1590 is to come up with a new MemoryAllocator that meets the following two requirements even under very high load: (A) Steady allocation patterns, instead of GC fluctuations (B) Lower amount of memory being retained, at least "after some time"

Some ideas to explore:

Allocate native memory over a certain threshold (~1MB) deferring the memory management to the OS 1.1 Consider pooling unmanaged memory, especially if we implement the next point.
Go discontigous, and build all large buffers from fix-sized blocks of memory similarly to RecyclableMemoryStream. It's worth to check how does this work with both pooled arrays and unmanaged memory.
Have a threshold on the reatained memory, and release some of the pooled arrays (or pooled unmanaged buffers) when we grow over it.
Mentioning an idea I really hate for the sake of completeness: Have a synchronization mechanism around large buffer acquisition, similarly to System.Drawing (WIC?). To do this properly, memory allocation should become an asynchronous operation, otherwise SpinLocks will just add more fuel to the fire.

Point 1. seems to be very simple to prototype. We need an allocator that uses Marshal.AllocHGlobal over some threshold, and an ArrayPool below it, and see how the memory timeline goes with the bee heads MemoryStress benchmark in comparison to ArrayPoolMemoryAllocator.

@saucecontrol any thoughts or further ideas? (Especially on point 2.)

cshung commented 3 years ago

The problem is that I'm not sure if it's a good practice to touch that property from library code. The compaction comes with extra GC cost, unexpected by the user, and we don't know when / how often to compact.

Not compacting the LOH automatically by default is a bit sad. Historically, we never compacted the LOH, some customers depended on that fact and assume LOH allocated objects are pinned, so we cannot automatically compact the LOH without some kind of user opt-in, otherwise we might break some users. The GCSettings.LargeObjectHeapCompactionMode property seems to solved most people's pressing issue and therefore we were not looking into it.

Those days are long gone, now we can automatically compact the LOH with minimal configuration. Starting with .NET 6, you can specify the GCConserveMemory settings. It is a number ranging from 0 to 9, indicating how much you want the GC to work to conserve memory instead of giving the best possible speed.

Unfortunately, the setting is not documented yet, it will be. For the time being, we can explore what the setting could do by looking at the code here(*).

If you search for compacting LOH due to GCConserveMem setting, you should find the relevant logic for deciding LOH compaction. As an overview, the implementation is checking the fragmentation ratio (i.e. how much percent of memory is wasted as free space). If it is larger than a threshold derived from the GCConserveMemory settings, then it will turn on LOH compaction.

This should alleviate the need to set that property - whether or not to compact the LOH is best left for the GC to decide.

() Whatever we actually do in the code is an implementation detail that is subjected to change, we do need the flexibility to avoid painting ourselves into a corner, like what we did with LOH not compacting.*

cshung commented 3 years ago

@cshung if you think you have some time to chat about this, let me know, I would really appreciate the help!

I am more than happy to reach out to developers like you who care about garbage collection performance. My personal goal is to understand the needs and ideally come up with benchmarks that are representative to work on. What is the best way to reach you?

antonfirsov commented 3 years ago

@cshung thanks a lot for the answers!

Starting with .NET 6, you can specify the GCConserveMemory settings.

From our perspective the problem with an opt-in setting is that we can't configure it on behalf of our users. Even if we would promote it in our documentation, most users would still miss it, and complain about poor scalability compared to unmanaged libraries like skia(sharp). It's much better if the library "just works" without any extra configuration thanks to good defaults.

For now I decided to go on with our switching to unmanaged memory because of two reasons:

The compaction issues discussed above
Unmanaged memory has better characteristics when allocations are overflowing the pooling threshold, and the images are disposed immediately. (this instead of this)

However, thinking in longer terms, this feels wrong to me. Ideally, a managed library should be able to meet all of it's requirements using managed memory only. Would be cool to switch back to GC in a future version. I wonder if ImageSharp is some sort of special animal here, or are there other memory-heavy libraries or apps facing similar issues.

My personal goal is to understand the needs and ideally come up with benchmarks that are representative to work on. What is the best way to reach you?

That's great to hear, we can chat on Teams I think.

SixLabors / ImageSharp

More efficient MemoryAllocator #1596