API Proposal: SocketsHttpConnectionContext.ConnectAsynchronously

stephentoub commented 3 years ago

Background and Motivation

In .NET 5, we added SocketsHttpHandler.ConnectCallback, which enables a developer to supply a delegate that'll be used to establish connections. In .NET 5, the callback is only usable for asynchronous operations, but we also added sync support to SocketsHttpHandler (and HttpClient), and if you try to both use the ConnectCallback and sync APIs, you get an exception. We can trivially remove this restriction simply by telling the ConnectCallback in what mode it should operate.

Proposed API

namespace System.Net.Http
{
    public sealed class SocketsHttpConnectionContext
    {
+        public bool ConnectAsynchronously { get; } // true for SendAsync, false for Send
         ...
     }
}

Usage Examples

socketsHttpHandler.ConnectCallback = async (ctx, cancellationToken) =>
{
    var socket = new Socket(SocketType.Stream, ProtocolType.Tcp) { NoDelay = true };
    try
    {
        if (context.ConnectAsynchronously)
        {
            await socket.ConnectAsync(context.DnsEndPoint, cancellationToken).ConfigureAwait(false);
        }
        else
        {
            using (cancellationToken.UnsafeRegister(static s => ((Socket)s!).Dispose(), socket))
                socket.Connect(context.DnsEndPoint);
        }
    }
    catch
    {
        socket.Dispose();
        throw;
    }

    return new NetworkStream(socket, ownsSocket: true);
};

ConnectAsynchronously will be true for all existing uses that work today, because today the synchronous Send will throw an exception if a callback is provided. If once this API exists the ConnectCallback hasn't been updated to respect a false ConnectAsynchronously and a synchronous Send is issued, the caller will simply block waiting for the operation to complete (alternatives are also possible, like throwing an exception).

This also helps to consolidate/simplify logic in SocketsHttpHandler, which no longer needs to maintain two separate connection code paths, to special-case ConnectCallback, etc.

Example implementation: https://github.com/stephentoub/runtime/commit/3674c8d82d6f9b26f5c4dd99dc4de13c995e7b91

ghost commented 3 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details

Description:	## Background and Motivation In .NET 5, we added SocketsHttpHandler.ConnectCallback, which enables a developer to supply a delegate that'll be used to establish connections. In .NET 5, the callback is only usable for asynchronous operations, but we also added sync support to SocketsHttpHandler (and HttpClient), and if you try to both use the ConnectCallback and sync APIs, you get an exception. We can trivially remove this restriction simply by telling the ConnectCallback in what mode it should operate. ## Proposed API ```diff namespace System.Net.Http { public sealed class SocketsHttpConnectionContext { + public bool ConnectAsynchronously { get; } // true for SendAsync, false for Send ... } } ``` ## Usage Examples ```C# socketsHttpHandler.ConnectCallback = async (ctx, cancellationToken) => { var socket = new Socket(SocketType.Stream, ProtocolType.Tcp) { NoDelay = true }; try { if (context.ConnectAsynchronously) { await socket.ConnectAsync(context.DnsEndPoint, cancellationToken).ConfigureAwait(false); } else { using (cancellationToken.UnsafeRegister(static s => ((Socket)s!).Dispose(), socket)) socket.Connect(context.DnsEndPoint); } } catch { socket.Dispose(); throw; } return new NetworkStream(socket, ownsSocket: true); }; ``` ConnectAsynchronously will be true for all existing uses that work today, because today the synchronous Send will throw an exception if a callback is provided. If once this API exists the ConnectCallback hasn't been updated to respect a false ConnectAsynchronously and a synchronous Send is issued, the caller will simply block waiting for the operation to complete (alternatives are also possible, like throwing an exception). This also helps to consolidate/simplify logic in SocketsHttpHandler, which no longer needs to maintain two separate connection code paths, to special-case ConnectCallback, etc. Example implementation: https://github.com/stephentoub/runtime/commit/3674c8d82d6f9b26f5c4dd99dc4de13c995e7b91
Author:	stephentoub
Assignees:	-
Labels:	`api-suggestion`, `area-System.Net.Http`
Milestone:	[object Object]

geoffkizer commented 3 years ago

Another option would be to simply always do the connect async and do sync-over-async for the sync path.

We are presumably already blocking the calling thread in other cases here, e.g. when the connection limit is hit.

stephentoub commented 3 years ago

We could. But there is no connection limit by default. And on Linux, that would put the sockets into a state that makes all sync operations more expensive for the lifetime of the connection.

geoffkizer commented 3 years ago

And on Linux, that would put the sockets into a state that makes all sync operations more expensive for the lifetime of the connection.

Have we actually measured this?

Operations that can complete immediately will still complete immediately. The handling is basically the same here regardless of path -- there may be a little bit more overhead in going through the SocketAsyncContext path, but it's probably dwarfed by kernel call overhead.

Operations that can't complete immediately will still block. The blocking happens differently, and is certainly more involved in the SocketAsyncContext path -- but once you block, it's not clear to me that any of that matters all that much.

geoffkizer commented 3 years ago

I also wonder what our plan here is for HTTP2, HTTP3, etc where there is no good way to avoid sync-over-async.

stephentoub commented 3 years ago

Have we actually measured this?

Yes.

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Running;
using System;
using System.Net;
using System.Net.Sockets;
using System.Threading.Tasks;

[MemoryDiagnoser]
public class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);

    private NetworkStream _client, _server;

    [Params(false, true)]
    public bool EmulatedSync { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        var listener = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
        var client = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);

        listener.Bind(new IPEndPoint(IPAddress.Loopback, 0));
        listener.Listen();

        client.Connect(listener.LocalEndPoint);
        Socket server = listener.Accept();

        _client = new NetworkStream(client, ownsSocket: true);
        _server = new NetworkStream(server, ownsSocket: true);

        if (EmulatedSync)
        {
            var t1 = _client.ReadAsync(new byte[1]).AsTask();
            var t2 = _server.ReadAsync(new byte[1]).AsTask();
            _client.WriteAsync(new byte[1]).AsTask().Wait();
            _server.WriteAsync(new byte[1]).AsTask().Wait();
            t1.Wait();
            t2.Wait();
        }

        Task.Run(() =>
        {
            while (true)
            {
                _client.ReadByte();
                _client.WriteByte(42);
            }
        });
    }

    [Benchmark]
    public void ReadWrite()
    {
        for (int i = 0; i < 1000; i++)
        {
            _server.WriteByte(42);
            _server.ReadByte();
        }
    }
}

Method	EmulatedSync	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ReadWrite	False	57.17 ms	1.029 ms	0.963 ms	-	-	-	-
ReadWrite	True	105.53 ms	2.063 ms	1.930 ms	-	-	-	368792 B

geoffkizer commented 3 years ago

Nevermind, I see now.

geoffkizer commented 3 years ago

That said, I'm still skeptical.

Your data shows that the existing sync-over-async logic in SocketAsyncContext etc is more expensive than a direct sync call. But I don't think we have focused on the perf of this path in any meaningful way. If we cared about it, I expect we could improve this in non-trivial ways.

stephentoub commented 3 years ago

But I don't think we have focused on the perf of this path in any meaningful way. If we cared about it, I expect we could improve this in non-trivial ways.

If we can without negatively impacting true sync or true async, we should.

This is still a red herring, though. They main issue is sync-over-async on a path we otherwise wouldn't have it. If you're doing HTTP/1.1, and you're not explicitly setting MaxConnectionsPerServer, this should be able to be sync all the way.

geoffkizer commented 3 years ago

If you're doing HTTP/1.1, and you're not explicitly setting MaxConnectionsPerServer, this should be able to be sync all the way.

Unless/until we implement https://github.com/dotnet/runtime/issues/44818.

stephentoub commented 3 years ago

Unless/until we implement #44818.

Even if we implement that, it's very dialable: there's no reason sync requests would need to participate, and could continue waiting for any connection they initiate, which is also the easiest thing to do since they'd be synchronously opening.

stephentoub commented 3 years ago

For now, I suggest we simply change the code to allow making sync requests even if a connect callback is provided. Someone who knows all requests on their HttpClient will be sync can create a sync callback, and if the callback otherwise didn't complete synchronously, well, we block. But at least we've made it possible. The next step is then exposing this one additional Boolean property on an advanced type, should we desire.

geoffkizer commented 3 years ago

For now, I suggest we simply change the code to allow making sync requests even if a connect callback is provided. Someone who knows all requests on their HttpClient will be sync can create a sync callback, and if the callback otherwise didn't complete synchronously, well, we block.

Seems reasonable to me.

scalablecory commented 3 years ago

@stephentoub do we have a customer who is doing both sync and async with one handler?

I would generally expect the user to just do the right thing in the callback based on their usage patterns here, and never need to check this API.

(Unless they are doing both sync and async)

stephentoub commented 3 years ago

do we have a customer who is doing both sync and async with one handler?

Ourselves, ideally: we can use this to address HttpWebRequest.ReadWriteTimeout, with multiple HWRs sharing a single handier instance. The workaround now is hacky and more expensive (and requires all consumers to have direct knowledge of the callback), putting a known property into all requests' options bags for the callback to inspect.

In all other cases where custom logic can be plugged in (e.g. handler.Send/SendAsync), the provider of the custom implementation is told whether it's sync or async. We dropped the ball on this one (the two callbacks), and we can fix it with a single bool property on each context type.

karelz commented 3 years ago

Triage: We should discuss it with @stephentoub in the room.

karelz commented 3 years ago

Triage: The only/biggest value is to make HttpWebRequest.ReadWriteTimeout work properly - that is tracked in #43520.

dotnet / runtime