Remora / Remora.Discord

A data-oriented C# Discord library, focused on high-performance concurrency and robust design.
GNU Lesser General Public License v3.0
246 stars 44 forks source link

Optimize receving payloads¹ #300

Closed VelvetToroyashi closed 1 year ago

VelvetToroyashi commented 1 year ago

¹This PR only optimizes WebSocketPayloadTransportService's ReceiveAsync method.

This PR is one of many that will tackle various issues mentioned in #254.

This one in particular relates to the performance (speed, allocations) concerns raised by the current implementation of receiving a payload (as implemented by the aforementioned class).

Notable changes are:

Considerations and justifications

It should be considered that as a critical part of the library, it's important that this code is battle-tested, and not just validated by unit-tests. In theory, the logic of the receive flow has not changed, but there's always the possibility for bugs and edge-cases that may be directly or indirectly found becuase of this PR; any bugs reported in regards to receiving gateway payloads should probably reference this PR for ease of tracking.

²The implementation of ArrayPoolBufferWriter<T> is backed by ArrayPool<T>.Shared by default

As for why these changes were made:

CommunityToolKit.HighPerformance is, as it says on the tin, a high-performance set of tools and APIs (such as ArrayPoolBufferWriter<T>) which could have even futher utlization within Remora after some careful consideration and triaging of potential optimizations (such as those commented upon in #254). (SendAsync was not changed as incoming payloads outnumber outgoing by upward of several orders of magnitude for larger bots).

The change to ValueWebSocketReceiveResult is twofold; ArrayPoolBufferWriter<T> exposes Memory<T> and Span<T>, the former of which is accepted as an overload to ClientWebSocket#ReceiveAsync, which is returns a struct result instead. This struct only contains two fields, so is much smaller, and never touches the heap. This optimization, if it can be called such, comes for free as a side-effect of other changes.

JsonSerializer.DeserializeAsync<T> was changed to its synchronous alternative because:

while (true)
{
    bufferState = await bufferState.ReadFromStreamAsync(utf8Json, cancellationToken).ConfigureAwait(false);
    TValue? value = ContinueDeserialize<TValue>(ref bufferState, ref jsonReaderState, ref readStack, jsonTypeInfo);

    if (bufferState.IsFinalBlock)
    {
        return value;
    }
}

The latter is especially important to consider, given where the data is coming from.

Becuase the only asynchronous part of the deserialization part is reading from the stream, this is rather bad becuase the stream it's reading from is a MemoryStream, which means all the data it's reading is already sitting in memory, not on the wire/on disk. Becuase of this, the overhead of async actually causes deserialization to be slower than if the synchronous version was used.

VelvetToroyashi commented 1 year ago

Deserialization alone was a ~3X increase (w.r.t itself), as seen here:

|                      Method |       Mean |    Error |   StdDev |     Median | Allocated |
|---------------------------- |-----------:|---------:|---------:|-----------:|----------:|
|  SynchronousDeserialization |   391.6 ns |  7.78 ns | 11.88 ns |   390.5 ns |      33 B |
| AsynchronousDeserialization | 1,328.1 ns | 31.64 ns | 92.29 ns | 1,289.9 ns |      97 B |