dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.44k stars 10.01k forks source link

ConnectionResetException flood #53591

Open verdie-g opened 9 months ago

verdie-g commented 9 months ago

Is there an existing issue for this?

Describe the bug

Our ASP.NET Core 6 services, run behind HAProxy, are flooded with ConnectionResetException.

In this board you can see a very high number of exceptions. image By logging the first chance exceptions it seems to come from

Microsoft.AspNetCore.Connections.ConnectionResetException: Connection reset by peer
---> System.Net.Sockets.SocketException (104): Connection reset by peer
at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoReceive()
--- End of inner exception stack trace ---
at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
at System.IO.Pipelines.Pipe.GetReadAsyncResult()
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[TContext](IHttpApplication`1 application)

By collecting the exceptions from the event pipe we have similar results: image

We believe that issue appeared after this HAProxy change: https://github.com/haproxy/haproxy/commit/4d1ff11f05691aa6820a985c31e72811cf9ef95d which introduced a new way to kill idle connections by sending a RST instead of a FIN ACK.

This create two problems on our side:

  1. We can't monitor the rate of exceptions anymore
  2. These ConnectionResetException also seem to induce lock contention which impact the performance on of service image

Expected Behavior

Clients should not have a way to flood a server with exceptions.

Steps To Reproduce

using System.Net;
using System.Net.Sockets;
using System.Text;

var builder = WebApplication.CreateBuilder(args);

var app = builder.Build();

AppDomain.CurrentDomain.FirstChanceException += (sender, eventArgs) =>
{
    Console.WriteLine(eventArgs.Exception.ToString());
};

Task.Run(async () =>
{
    await Task.Delay(2000);
    Console.WriteLine("Opening");
    using Socket socket = new(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
    await socket.ConnectAsync(new DnsEndPoint("localhost", 5233));
    await Task.Delay(2000);
    string reqLine = "POST / HTTP/1.1\r\nHost: example.org\r\nContent-Length: 10000000\r\n\r\n";
    socket.Send(Encoding.ASCII.GetBytes(reqLine));

    socket.Send(new byte[100_000]);
    socket.Send(new byte[100_000]);
    socket.Send(new byte[100_000]);

    await Task.Delay(2000);

    Console.WriteLine("Closing");
    socket.Close();
});

app.Run();

With this code I'm not getting the exact same exception but something similar. It is reproducible on both .NET 6 and 8.

Microsoft.AspNetCore.Connections.ConnectionResetException: An existing connection was forcibly closed by the remote host.
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()

Exceptions (if any)

No response

.NET Version

6.0.25

Anything else?

No response

amcasey commented 9 months ago

@verdie-g I'd like to understand your request a little better. It sounds like your server is receiving a larger number of TCP RST frames and that this is expected. And your concern is that, when many connections are reset at the same time, the server sees many SocketExceptions (which it then wraps in ConnectionResetExceptions, for consistency), which might hurt performance (and, therefore, DoS resistance)?

Do you have a preference for what should happen differently? I'm not sure Socket has an alternative way to signal that a connection has been reset. A FIN can be represented by a zero-byte read (i.e. end of incoming data), but a reset can during any socket operation.

verdie-g commented 9 months ago

Ideally I would like that no exceptions to be thrown so that we can have alerting when the number of exceptions is high. That's what we do for our services that are not behind HAProxy and I think it's valid to consider that the service is in a bad state when its number of exceptions is high.

The contention is concerning too but I can't say for sure how much it impacts the service.

amcasey commented 9 months ago

Interesting, in that case, I imagine you must be filtering out things like OperationCanceledException and other first-chance exceptions that are widely used for flow control in .NET. If you already have filtering capability, is filtering out SocketException an option? If you're not interested in knowing about connection resets, I'm not sure when it would be useful to you.

Based on the stack in your initial post, the contention seems to be from PipeWriter (or possibly its subtype ConcurrentPipeWriter), which needs to ensure writes to the pipe don't get jumbled together. There may be room for refinement, but I doubt that that synchronization can be eliminated.

We'll think about whether there are improvements we can make in this area but, to be frank, I'm skeptical.

ghost commented 9 months ago

Thanks for contacting us.

We're moving this issue to the .NET 9 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s). If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

verdie-g commented 9 months ago

I imagine you must be filtering out things

We're using exception-count from the System.Runtime event source so there is no filtering. We have another metric that is based on data read from an event pipe that gives the name of the exception but it's kind of costly so I would prefer only relying on the counter. Anyway, here it can give you an idea of our amount of OperationCanceledException image If you ignore the awkward ArgumentException and IndexOutOfRangeException we only have ~5 OperationCanceledException per second.

We'll think about whether there are improvements we can make in this area but, to be frank, I'm skeptical.

Thanks.

verdie-g commented 8 months ago

I get the exact same problem using the Gatling load testing. image How much can I trust the results of a load test when the contention explodes :/