dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.58k stars 10.06k forks source link

Logs filled with critical exceptions from Kestrel/Quic #45105

Open mrgleba opened 2 years ago

mrgleba commented 2 years ago

Is there an existing issue for this?

Describe the bug

I run an app in a linux docker container (tag 7.0) with http/3 enabled. The libmsquic version in 2.1.4 (from ms repo). The server logs are being filled with the following message:

Microsoft.AspNetCore.Server.Kestrel Critical Unexpected exception in HttpConnection.ProcessRequestsAsync. 
An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER
  at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicStreamContext.Abort(ConnectionAbortedException abortReason)
  at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3FrameWriter.Abort(ConnectionAbortedException error)
  at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3ControlStream.Abort(ConnectionAbortedException abortReason, Http3ErrorCode errorCode)
  at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
  at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)

This doesn't influence the operations as far as we see. The server responds normally despite the messages.

Expected Behavior

No messages in logs

Steps To Reproduce

No response

Exceptions (if any)

No response

.NET Version

7.0 runtime (no sdk)

Anything else?

No response

adityamandaleeka commented 2 years ago

@mrgleba Do you see this on any app or is there anything interesting about your code?

adityamandaleeka commented 2 years ago

@karelz Is there any guidance on which libmsquic versions are compatible with .NET 7 on Linux?

We have a note in the docs for .NET 6 here: https://learn.microsoft.com/en-us/aspnet/core/fundamentals/servers/kestrel/http3?view=aspnetcore-7.0#linux

mrgleba commented 2 years ago

@adityamandaleeka Nothing I can point out. Just upgraded to 7.0 and enabled http/3.

adityamandaleeka commented 2 years ago

@mrgleba Just out of curiosity, can you try the 2.1.1 version of msquic and see if you have the same issue?

cc: @nibanks

mrgleba commented 2 years ago

I can confirm it's the same with 2.1.1

adityamandaleeka commented 2 years ago

Thanks @mrgleba

ezar commented 1 year ago

Same error using Windows 2022

ladeak commented 1 year ago

I can find the very same message on Windows. It appears sometimes after the client has completed the request and the client process has completed.

crit: Microsoft.AspNetCore.Server.Kestrel[0]
      Unexpected exception in HttpConnection.ProcessRequestsAsync.
      System.Net.Quic.QuicException: An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER
         at System.Net.Quic.ThrowHelper.ThrowIfMsQuicError(Int32 status, String message)
         at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicStreamContext.Abort(ConnectionAbortedException abortReason)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3FrameWriter.Abort(ConnectionAbortedException error)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3ControlStream.Abort(ConnectionAbortedException abortReason, Http3ErrorCode errorCode)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)
dotnet --info
.NET SDK:
 Version:   7.0.200-preview.22628.1
 Commit:    76bbaffad6

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.22621
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\7.0.200-preview.22628.1\

Host:
  Version:      7.0.0
  Architecture: x64
  Commit:       d099f075e4
amcasey commented 1 year ago

I might be missing something, but this looks very similar to https://github.com/dotnet/runtime/issues/73688.

amcasey commented 1 year ago

@rzikm @CarnaViire Am I mistaken in thinking this should have been fixed by https://github.com/dotnet/runtime/pull/75179?

Edit: I see that we have a lot of our own locking code in this area, so there may be an additional race.

amcasey commented 1 year ago

@mrgleba @ladeak One thing I notice looking through the frames in the call stack is that there's a whole bunch of logging around what was aborted and why. Are you seeing that in your logs? You might need LogLevel.Debug.

wfurt commented 1 year ago

yes, this looks similar to https://github.com/dotnet/runtime/issues/73688. We should probably collect System.Net.Quic logs and possibly rotaries from MsQuic as well https://github.com/microsoft/msquic/blob/main/docs/Diagnostics.md

Note that the fix was specifically around ODE. As far as I know MsQuic could return the QUIC_STATUS_INVALID_PARAMETER in other cases when it feels connection/stream is closed and done. The logs may yield some useful information.

Or somebody should come up with simple repro we can investigate.

ladeak commented 1 year ago

I have this test webapplication I employ in a repo: https://github.com/ladeak/Http3Repl/tree/master/tests/TestWebApplication

There is really nothing special about it with the 3 endpoints. Invoke any with H3 request, wait a minute or two, and the error will show. I typically encounter the issue simply just using the MapGet for "/".

@amcasey I will increase the log level to further investigate.

wfurt commented 1 year ago

How do you drive it? From HttpClient e.g C# & MsQuic? I'm wondering if the peer aborts the stream and then it is not open at the time Kestrel tries to do abort....

ladeak commented 1 year ago

Yes, I use HttpClient (which I understand uses MsQuic).

amcasey commented 1 year ago

I wrote a trivial HttpClient driver and have yet to see a repro (though it seems like a race, so that's hardly definitive).

Did you find anything interesting in the logs @ladeak?

ladeak commented 1 year ago

Nothing so far, but also something apparent on .net8 preview 2.

mrgleba commented 1 year ago

I've tried upgrading libmsquic to 2.2.1 but the problem persists

ManickaP commented 1 year ago

To investigate this, we need Private.InternalDiagnostics.System.Net.Quic internal logging as well as msquic logs: https://github.com/microsoft/msquic/blob/main/docs/Diagnostics.md. I tried to reproduce this in the past without any success, so getting the logs from you is probably the best bet now.

BTW, do you kill the client process immediately after sending the request? Do you dispose the client? If not, this could be related to https://github.com/dotnet/runtime/issues/71927 (TL;DR: MsQuic is a user space library so when you close the client app and the process disappears from the memory, anything that was buffered but not yet send by MsQuic gets lost. In the linked issues, it was the last ACK that got lost and server waiting on it timed out).

To confirm if this is it, you should dispose the HttpClient after the testing request and keep the process alive for a bit after that. If it doesn't reproduce then, it probably is https://github.com/dotnet/runtime/issues/71927.

mrgleba commented 1 year ago

Just to clarify. The problem occurs on a prodction server (the logs are from customers' connections). The messages are quite abundant (a few per minute).

I'll see if I can set up the logs, but it won't be easy in production :(

ManickaP commented 1 year ago

Private.InternalDiagnostics.System.Net.Quic logs are not suitable for production, they are very verbose and might negatively impact perf. I thought that you were able to reproduce it locally with the repro shared above. But that was probably just @ladeak, maybe they can collect and share the logs.

ladeak commented 1 year ago

This is my client console app (I am using .NET 8 Preview 4 at this time):

        using var client = new HttpClient();
        var request = new HttpRequestMessage(HttpMethod.Get, "https://localhost:5001/");
        request.Version = HttpVersion.Version30;
        request.VersionPolicy = HttpVersionPolicy.RequestVersionExact;
        var response = await client.SendAsync(request);
        response.EnsureSuccessStatusCode();
        Console.WriteLine(await response.Content.ReadAsStringAsync());
        await Task.Delay(10000);

(Not clear if you prefer the timeout after - or before the dispose).

I see this in the server logs, I was able to reproduce, but the errors typically come ~1 min after the requests:

crit: Microsoft.AspNetCore.Server.Kestrel[0]
      Unexpected exception in HttpConnection.ProcessRequestsAsync.
      System.Net.Quic.QuicException: An internal error has occurred. StreamShutdown failed: QUIC_STATUS_INVALID_PARAMETER
         at System.Net.Quic.ThrowHelper.ThrowIfMsQuicError(Int32 status, String message)
         at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicStreamContext.Abort(ConnectionAbortedException abortReason)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3FrameWriter.Abort(ConnectionAbortedException error)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3ControlStream.Abort(ConnectionAbortedException abortReason, Http3ErrorCode errorCode)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)

The server:

var builder = WebApplication.CreateBuilder(args);
builder.WebHost.UseKestrel(kestrel =>
{
    kestrel.ListenAnyIP(5001, options =>
    {
        options.UseHttps();
        options.Protocols = HttpProtocols.Http1AndHttp2AndHttp3;
    });
});
var app = builder. Build();

app.MapGet("/", async context =>
{
    await context.Response.WriteAsync("Hello World " + context.Request.Protocol.ToString());
});

I have collected the logs as linked using the netsh command. The file is 6.5MB and seems binary, how can I share it @ManickaP ?

netsh.exe trace start overwrite=yes report=dis correlation=dis traceFile=quic.etl provider={ff15e657-4f26-570e-88ab-0796b258d11c} level=0x5 keywords=0xffffffff

ManickaP commented 1 year ago

I don't know the limits but you can attach it here, or somewhere else and share a link. Or you can send it to my email mapichov at microsoft dot com.

The etl file can be opened with PerfView: https://github.com/microsoft/perfview

ladeak commented 1 year ago

I have sent it in an email. If it does not work, I'll share it through OneDrive.

ManickaP commented 1 year ago

So I found 2 occurrences of [ api] Exit 2147942487 (e.g. INVALID_PARAMETER) in the logs. Both were for an incoming unidirectional stream (control | QPack) and both were due to the layer above (Kestrel) calling Abort on an already shutdown stream. What happens is that connection idles out and part of that process is also shutdown of all active streams (with the same cause "Connection Idle"). As control streams are not meant to be closed ever, they are alive at this point and they receive this event. What I suspect causes the error to happen here is that processing of this stream event that eventually removes the stream from _streams collection: https://github.com/dotnet/aspnetcore/blob/b441d4e816ef3bc43b72e91ba8793f252ada2c5a/src/Servers/Kestrel/Core/src/Internal/Http3/Http3ControlStream.cs#L244 races with connection closure and in some cases the connection is faster and calls abort: https://github.com/dotnet/aspnetcore/blob/b441d4e816ef3bc43b72e91ba8793f252ada2c5a/src/Servers/Kestrel/Core/src/Internal/Http3/Http3Connection.cs#L499-L506

@JamesNK Could you look into what I found out? If I'm right, this is not a problem of S.N.Quic.

Details from msquic logs:

Error case:
2023-05-20T06:41:35.813851,6155,5,Microsoft-Quic,[strm][0x17783af6e00] Indicating QUIC_STREAM_EVENT_SHUTDOWN_COMPLETE [ConnectionShutdown=1, ConnectionShutdownByApp=0, ConnectionClosedRemotely=0, Co
2023-05-20T06:41:35.815034,11,5,Microsoft-Quic,[ api] Enter STREAM_SHUTDOWN (0x17783af6e00).
2023-05-20T06:41:35.815036,13,5,Microsoft-Quic,[ api] Exit 2147942487

Correct case:
2023-05-20T06:41:49.119196,6155,5,Microsoft-Quic,[strm][0x17783b92870] Indicating QUIC_STREAM_EVENT_SHUTDOWN_COMPLETE [ConnectionShutdown=1, ConnectionShutdownByApp=0, ConnectionClosedRemotely=0, Co
amcasey commented 1 year ago

@ManickaP Thanks for that analysis! I can take next steps (which will probably be following up with @JamesNK 😆).

mrgleba commented 1 year ago

This still occurs with net8 & libmsquic.2.2.4 😢

Beblia commented 2 months ago

Hello everyone. Any updates on this issue? I have something similar as well. 100+ of these errors every single day. This happens on both .NET 8 and 9

2024-09-12 20:52:56.705 -07:00 [FTL] Unexpected exception in HttpConnection.ProcessRequestsAsync. System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode') Actual value was -1. at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode) at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication1 application) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication1 httpApplication) 2024-09-12 20:52:59.058 -07:00 [FTL] Unexpected exception in HttpConnection.ProcessRequestsAsync. System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode') Actual value was -1. at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode) at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication1 application) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication1 httpApplication) 2024-09-12 20:53:06.262 -07:00 [FTL] Unexpected exception in HttpConnection.ProcessRequestsAsync. System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode') Actual value was -1. at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode) at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication1 application) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication1 httpApplication) 2024-09-12 20:53:07.864 -07:00 [FTL] Unexpected exception in HttpConnection.ProcessRequestsAsync. System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode') Actual value was -1. at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode) at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication1 application) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication1 httpApplication) 2024-09-12 20:53:09.085 -07:00 [FTL] Unexpected exception in HttpConnection.ProcessRequestsAsync. System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode') Actual value was -1. at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode) at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication1 application) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication1 httpApplication)

amcasey commented 2 months ago

Here's another place the stream could (race to?) be aborted https://github.com/dotnet/aspnetcore/blob/ffd5a38ecc618cac8c424a78ea8f452779120276/src/Servers/Kestrel/Core/src/Internal/Http3/Http3Connection.cs#L298 It looks like it might have the wrong error though, so probably a red herring.

amcasey commented 2 months ago

Toy repro: launch server, launch client, wait ~20 seconds

Server

using Microsoft.AspNetCore.Server.Kestrel.Core;

var builder = WebApplication.CreateBuilder(args);

builder.WebHost.ConfigureKestrel(options =>
{
    options.ListenLocalhost(5001, listenOptions =>
    {
        listenOptions.Protocols = HttpProtocols.Http3;
        listenOptions.UseHttps();
    });

    options.Limits.KeepAliveTimeout = TimeSpan.FromSeconds(5);
});

var app = builder.Build();

app.MapGet("/", () => "Hello, world!");

app.Run();

Client

using var client = new HttpClient()
{
    BaseAddress = new Uri("https://localhost:5001"),
    DefaultRequestVersion = new Version(3, 0),
    DefaultVersionPolicy = HttpVersionPolicy.RequestVersionExact,
};

try
{
    var response = await client.GetAsync("/");
    response.EnsureSuccessStatusCode();

    var data = await response.Content.ReadAsStringAsync();
    Console.WriteLine(data);
}
catch (HttpRequestException e)
{
    Console.WriteLine($"Request error: {e.Message}");
}
amcasey commented 2 months ago

Oops, my repro is for a slightly different (but probably related?) error:

crit: Microsoft.AspNetCore.Server.Kestrel[0]
      Unexpected exception in HttpConnection.ProcessRequestsAsync.
      System.ArgumentOutOfRangeException: A value between 0x0 and 0x3fffffffffffffff is required. (Parameter 'errorCode')
      Actual value was -1.
         at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.QuicTransportOptions.ValidateErrorCode(Int64 errorCode)
         at Microsoft.AspNetCore.Server.Kestrel.Transport.Quic.Internal.QuicConnectionContext.set_Error(Int64 value)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.Abort(ConnectionAbortedException ex, Http3ErrorCode errorCode, ConnectionEndReason reason)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)

This validation was added in https://github.com/dotnet/aspnetcore/pull/55282.

Edit: filed #57933 for the new issue.