Establishing new client connections fail intermittently when using the library to connect against an AMQPS endpoint over IPv6 in .NET 6 (if we switch the client code back to .NET core 2.1 or connect to an IPv4 broker, the errors no longer happen).
Repro steps
Start the local test broker on an IPv6 amqps address, like the following:
using System;
using System.Threading.Tasks;
using Microsoft.Azure.Amqp;
using Microsoft.Azure.Amqp.Transport;
using System.Collections.Generic;
namespace MyApp
{
internal class Program
{
static HashSet ints = new HashSet();
static void Main(string[] args)
{
Run().Wait();
}
static async Task Run()
{
Uri uri = new Uri("amqps://[::1]:10196/");
for (int i = 0; i < 100; ++i)
{
AmqpConnectionFactory factory = new AmqpConnectionFactory();
factory.Settings.TransportProviders.Add(new TlsTransportProvider(new TlsTransportSettings()
{
CertificateValidationCallback = (a, b, c, d) => true,
CheckCertificateRevocation = false,
Protocols = System.Security.Authentication.SslProtocols.Tls12
}));
await factory.OpenConnectionAsync(uri, TimeSpan.FromSeconds(30));
Console.WriteLine("Success");
}
}
}
}
This code simply opens a connection 100 times. On .NET core 2.1 this code works fine, but on .NET 6, after a few iterations, the code eventually fails with the following exception:
System.IO.IOException : Transport 'tls4' is valid for write operations.
---- System.InvalidOperationException : This operation is only allowed using a successfully authenticated context.
## Investigation
After a lengthy investigation, we were able to identify the root cause of the race condition in the following call on
[TcpTransportInitiator.cs:44](https://github.com/Azure/azure-amqp/blob/master/src/Transport/TcpTransportInitiator.cs#L44):
```c#
bool connectResult = Socket.ConnectAsync(SocketType.Stream, ProtocolType.Tcp, connectEventArgs);
When this call returns true all works well, which seems to always be the case in .NET core 2.1 or when connecting to an IPv4 broker. However, when it returns false, indicating that the connection was performed synchronously, the library breaks. In .NET 6, this call seems to return false from time to time for IPv6 sockets.
More specifically, when the call above returns false, it causes the following path on AmqpTransportInitiator.cs:367 to be executed twice:
if (!thisPtr.CompleteSelf(args.CompletedSynchronously, args.Exception))
{
if (args.Transport != null)
{
// completed by timer
args.Transport.Abort();
}
}
The first time causes the operation to complete. The second time, however, because the operation was already completed once, causes Transport.Abort() to be called, which disposes the connection, cause the failures we see above.
Issue
Establishing new client connections fail intermittently when using the library to connect against an AMQPS endpoint over IPv6 in .NET 6 (if we switch the client code back to .NET core 2.1 or connect to an IPv4 broker, the errors no longer happen).
Repro steps
Start the local test broker on an IPv6 amqps address, like the following:
Run the following client code using .NET 6:
namespace MyApp { internal class Program { static HashSet ints = new HashSet();
}
System.IO.IOException : Transport 'tls4' is valid for write operations. ---- System.InvalidOperationException : This operation is only allowed using a successfully authenticated context.
When this call returns
true
all works well, which seems to always be the case in .NET core 2.1 or when connecting to an IPv4 broker. However, when it returnsfalse
, indicating that the connection was performed synchronously, the library breaks. In .NET 6, this call seems to returnfalse
from time to time for IPv6 sockets.More specifically, when the call above returns
false
, it causes the following path on AmqpTransportInitiator.cs:367 to be executed twice:The first time causes the operation to complete. The second time, however, because the operation was already completed once, causes
Transport.Abort()
to be called, which disposes the connection, cause the failures we see above.