dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.2k stars 9.94k forks source link

Feature request: User-oriented WebSocket server and client #17144

Open JesperTreetop opened 4 years ago

JesperTreetop commented 4 years ago

The .NET BCL includes a low-end, raw ClientWebSocket class dealing at the frame level with the protocol. You have to pass it ArraySegment<byte>, you have to assemble frames into messages, manage, pipeline and pump both sides of the connection. Many platforms have a blessed WebSocket client (and sometimes server) that abstracts over receiving frames and pipelining ongoing requests. For example, the client inside the HTML5 standard is able to work on a more user-oriented layer, with events for messages which pieces together frames for you.

As David points out, implementing this in a correct, speedy and secure way is hard. It has been done today in ASP.NET Core and exists as part of the implementation of SignalR. It would be a good opportunity to provide a well-implemented WebSocket client and server that is as robust, and that can be used out of the box, for the times where using WebSocket directly is required. It seems to me that maybe it could be possible to layer this such that the SignalR implementation would be able to use the WebSocket client/server as is, and to mirror previous efforts to extract publicly reusable components from the internal infrastructure.

There are ways in which implementing a WebSocket server (especially) could require tuning, like for how many frames to allow buffering - this is the case for the WebSocket implementation as part of SignalR or the HTTP or binary protocols provided by Kestrel just as well, where secure defaults have been set, judgments have been made and some knobs provided.

analogrelay commented 4 years ago

@davidfowl thoughts? Since you opened this box ;P.

It's certainly a reasonable idea. Do you have a thought on what this API would look like? We have the ConnectionHandler abstraction already, but that flattens things out into a stream rather than preserving framing.

JesperTreetop commented 4 years ago

I don't have a cooked idea about the shape of the API, but I realize it needs somewhat more of a model than "here's a bunch of bytes". Off the top of my head, System.Net.WebSockets is a pretty good likeness, but surfacing messages instead of frames as events or callbacks is the main difference, and of course the server side needs to be focused on being able to create/accept sockets for connecting clients. Obviously the data types should all be well chosen to make things easy for pipelines, Kestrel and Bedrock. A rough napkin sketch for the server, many details about Task vs ValueTask and the like not considered:


// register this with something to set up the handling
public interface IWebSocketChannelHandler {
    Task<IWebSocketConnectionHandler> AcceptAsync(
        IWebSocketConnectionHandlerContext context,
        CancellationToken token);
    // return a handler constructed with the context to
    // accept the connection; null to reject
    // (not very DI, there's probably a better way)

    // possibly something to retrieve state about connected handlers
}

// implemented by the user's handler
public interface IWebSocketConnectionHandler {
    Task OnReceiveBinaryMessageAsync(
        ReadOnlySequence<byte> binaryContents);
    Task OnReceiveTextMessageAsync(
        ReadOnlySequence<byte> utf8Contents); // or string or Utf8String
    Task<bool> OnReceivePingMessageAsync(
        ReadOnlySpan<byte> payload);
    // return value is whether to reply with a Pong;
    // Pong messages are silently handled and not received directly

    Task OnClosedAsync(
        System.Net.WebSockets.WebSocketCloseStatus status,
        string statusDescription);
    // "OnError" somehow?

    string SubProtocol { get; }
    // here, or somewhere else, to tell the channel handler
    // which subprotocol was chosen since returning a
    // constructed handler signals acceptance
    // - clear design wart! I just don't have a better idea now
}

public interface IWebSocketConnectionHandlerContext {
    EndPoint EndPoint { get; }
    // ...or one or more things that represents "the connection"
    // or "the other side" in Bedrock-like APIs
    CancellationToken ConnectionLifetimeCancellationToken { get; }
    // various information from the upgrade request,
    // including Certificate and SubProtocol,
    // something like System.Net.WebSockets.WebSocketContext

    // used to send when the connection has been accepted
    Task SendBinaryMessageAsync(Sequence<byte> binaryContents);
    Task SendTextMessageAsync(Sequence<byte> utf8Contents);
    Task SendTextMessageAsync(Utf8String contents);
    Task SendTextMessageAsync(string contents);

    Task SendPingAsync(Span<byte> binaryContents);
        // can only be up to 125 bytes

    Task CloseAsync(
        System.Net.WebSockets.WebSocketCloseStatus status,
        string statusDescription);
    Task AbortAsync(); // not nice, but exists in System.Net.WebSockets
}
JesperTreetop commented 4 years ago

I realize the more specific and comfortable to write against the handler surface is, the less like the other "here's a bunch of bytes delivered to a pipeline" transports it could become, so there's a tension. Then again, it looks like the WebSocket transport is being specially handled in the code base as it is.

analogrelay commented 4 years ago

surfacing messages instead of frames as events or callbacks is the main difference

Why is this preferable? It takes control away from the caller to handle scheduling. We're generally moving away from callbacks for I/O because await has the same effect and is generally more ergonomic. I much prefer a fire-and-forget task calling ReceiveAsync in a loop to a set of event handlers.

Obviously the data types should all be well chosen to make things easy for pipelines, Kestrel and Bedrock.

Pipelines doesn't really apply here since we're talking about a datagram protocol rather than a stream. The ConnectionHandler abstraction already has us well covered for using WebSockets in a mode where you don't care about message boundaries. If we're building something new, it would be something that is focused on use cases where the message boundaries do matter to the app. In that case, Pipelines isn't really appropriate since it only exposes a flat stream and doesn't provide a way to extract a single "message".

There are two major differences between the .NET API and the Browser API (as far as I can tell):

  1. The Browser API is event based (see above for commentary on that), the .NET API is async based.
  2. The Browser API coalesces and provides complete messages, the .NET API fills a buffer you provide

For the second one, the Browser API does that because ECMAScript was missing raw buffer types at the time WebSocket was implemented (whereas now we have Uint8Array). It also seems like the second point could be done in .NET with a simple extension method on WebSocket:

public static class WebSocketExtensions
{
    public static ValueTask<WebSocketMessage> ReceiveEntireMessageAsync(this WebSocket socket, ...);
}

public struct WebSocketMessage
{
    public ReadOnlyMemory<byte> Body { get; }
    public WebSocketMessageType Type { get; }
    // ...
}
JesperTreetop commented 4 years ago

Those are fair points. I meant for ease of consumption - similar to how MVC provides model binding, actions and controllers and isolates you from the repetitive rigors of parsing the connection. I don't mind writing a simple read-loop if it's uncomplicated. But if it's going to be the exact same code in each application, I wouldn't mind it being absorbed by something either.

I used Pipelines and ReadOnlySequence<byte> and all of that as stand-ins for whichever constructions are needed to pass the message from the parsing layer through to the handler code without copying.

The reason I went with events/callbacks in particular in my example (and it could also be an IAsyncEnumerable<WebSocketMessage> now, with the same effects) is basically the tricky bits mentioned by David, which noticeably surfaces as the "only one outstanding operation for each of Receive and Send" in System.Net.WebSockets. There, you'd want something to handle the pumping and coordination and do it in a safe and clean way, and then a chance to access those messages. (And similarly if you do await SendAsync(...) and something else is currently sending, waiting for that to finish is implicit.)

I imagine it as being called because that's what MVC or even using a routing endpoint does - it calls you, saying here's a request. I'm sure it could easily be safer to pull from it - but the layer doing the pumping/coordination could still apply backpressure and just not call you again until you're done handling it, and not read outstanding connections until you've made forward progress. It seems to me like this is what Kestrel and the HTTP stack does all the time - but I'm not an expert at this, though, so I could easily be missing something.