Improve RenderTreeFrameArrayBuilder pooling

Currently, RenderTreeFrameArrayBuilder gets its backing store from an ArrayPool<RenderTreeFrame>. We implemented this optimization because in typical applications, there will be many components appearing and disappearing over time, and they will build RenderTreeFrame arrays of differing and unpredictable lengths. Our intent was to reduce GC pressure by reusing these buffers.

However it's not clear this is beneficial in practice and might be actively disbeneficial. Especially in the newer SSR-based app architectures, it will be common within a single HTTP request to use dozens or hundreds of components to render a page, each of which will rent at least one RenderTreeFrame[32] and hold it until the end of that request. Given a reasonable level of concurrent requests, this multiplies out to a lot of concurrent arrays, beyond the level of concurrency that ArrayPool<T> was designed for. In practice the pool capacity will be exceeded, and even if not, the calls will be slowed down by contention at this scale.

Option 1: Separate pool per renderer

This could be a very quick change. Instead of using the default ArrayPool<T>.Shared, we could instantiate a separate pool for each Renderer. Then there should not be any further cross-request contention on the server. It would make no difference at all for WebAssembly because there's only one user in that case anyway, so if the pooling does benefit WebAssembly, we can retain that benefit. It would also retain some (possibly most) of the benefit for Blazor Server.

Option 2: Stop pooling completely

This would be very cheap to implement, but if doing this, we should attempt to measure the impact on:

Heavy interactive rendering workloads on WebAssembly
Heavy interactive rendering workloads on Server
Heavy SSR workloads

We need to not cause significant perf regressions to any of these. If we find that the optimal strategy differs a lot between (say) items 1 and 3, we may need to vary the behavior across these scenarios.

Option 3: Custom pooling

Given what we know about the nature of this specific scenario, we may be able to implement something with an API like ArrayPool<T> but tuned to this case, possibly in a lock-free manner.

Option 4: Custom allocation

If we're writing custom code here, then instead of replicating the ArrayPool<T> semantics, we might instead do something like:

Have Blazor allocate a large RenderTreeFrame[] array
Whenever something small like a RenderTreeFrame[32] is required, give back a Memory<RenderTreeFrame> referencing a subset of that array (manually tracking which ones are in use)

This is essentially the same as implementing a custom memory allocator so it's definitely nontrivial. We also need to consider what level of risk this might entail: for example if some buggy code tries to access beyond the end of the Memory<RenderTreeFrame> that we give it, can it read or write state for a different user? We probably would want to allocate a separate top-level array for each Renderer to guarantee we couldn't leak state across users even if such a bug exists, and to make disposal much simpler and cheaper.

If we were super determined, we could do something heuristic where at runtime we keep track of the distribution of array sizes consumed on a per-URL basis, and then vary the initial allocation size to correspond to the P90 value for that array or similar.

dotnet / aspnetcore