Kestrel: Ability to capture HTTP2 errors

cyberfirst-developer commented 9 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Is your feature request related to a problem? Please describe the problem.

I am using Kestrel with YARP in place of nginx, i did implemented some basic ban with ipset for bad requests. Once i found much of records in logs: crit: Microsoft.AspNetCore.Server.Kestrel.Http2[60] Connection id "0HN0HCNL3FVF3" exceeded the output operations maximum queue size. Microsoft.AspNetCore.Connections.ConnectionAbortedException: HTTP/2 connection exceeded the output operations maximum queue size. CPU load was 100% at that time, and proxy was nearly not avaliable.

i checked sources, and can't find the way to capture such errors(most of such errors are writen into log, and passed as argument to FrameWriter, but not exposed anywhere). I want to capture such errors and ban addresses which generate too much errors.

Describe the solution you'd like

one of the option i can think of is make and expose IConnectionAbortReasonFeature, which will be added to connection. and can be examined with in connection builder middleware.(after call to next in chain of course), with some kind enum, like it is done in YARP with ForwarderError enum)

Additional context

Currently i found possible way to capture exception, by making wrapping Connectioncontext into own one and overriding Abort method(all other prop and methods just proxy toold context), but i can't see this as good solution.

amcasey commented 9 months ago

As you probably already know (but, for the sake of others reading this later), that message indicates that a client is behaving as if it's performing a Rapid Reset attack. One of the reasons we decided to close the connection but not flag the particular client was because the immediate client could be a proxy representing many different users. That reasoning may not apply to your particular server, but it's worth considering.

Another thing to note is that the limit is configurable. If you find that the default value is not providing adequate protection (i.e. your CPU is hitting 100% and your proxy is nearly unavailable), you can try reducing Microsoft.AspNetCore.Server.Kestrel.Http2.MaxConnectionFlowControlQueueSize. Of course, this will only affect individual connections - if you receive many of these apparent attacks simultaneously, the limit won't help.

amcasey commented 9 months ago

When you see that log message, the corresponding connection will see a TCP RST frame send from kestrel. Is that a useful signal when determining whether to ban addresses?

cyberfirst-developer commented 9 months ago

When you see that log message, the corresponding connection will see a TCP RST frame send from kestrel. Is that a useful signal when determining whether to ban addresses?

Error happens when connection still considered alive, and we have responses to write there. i never saw kernel to send not existsing RST packet, is just closes socket AFAIK.

One of the reasons we decided to close the connection but not flag the particular client was because the immediate client could be a proxy representing many different users. That reasoning may not apply to your particular server, but it's worth considering.

Correct, and Kestrel actually can't do that itself(there only can be some interface to be implemented for marking clients. actually their src ip even, since connection already broken) I don't plan to mark client for single error, this error can happen even as not part of attack, but usually we had same error once per hour at most. I plan to count such errors, if amount passes some threshold, then limit this particular src for some time. i know that with NAT this can be many clients, but in most cases such ip's where from data centers, not from ISP. and safety of server comes first, better lose several clients, in comparison with being not responsive at all.

So, what i want is easier way to access such things, i currently found that with context replacement in ListenOption.Use i can capture error, but did not tested yet.

amcasey commented 9 months ago

What about using something like a logger provider?

internal sealed class RapidResetLoggerProvider : ILoggerProvider 
{ 
    public static readonly ILoggerProvider Instance = new RapidResetLoggerProvider(); 
    private RapidResetLoggerProvider() { } 
    public ILogger CreateLogger(string _categoryName) => RapidResetLogger.Instance; 
    public void Dispose() { } 

    private class RapidResetLogger : ILogger 
    { 
        public static readonly ILogger Instance = new RapidResetLogger(); 
        private RapidResetLogger() { } 
        public IDisposable BeginScope<TState>(TState _state) => DummyDisposable.Instance; 
        public bool IsEnabled(LogLevel logLevel) => logLevel >= LogLevel.Debug; 

        public void Log<TState>(LogLevel logLevel, EventId eventId, TState state, Exception exception, Func<TState, Exception, string> formatter) 
        { 
            switch (eventId.Name) 
            { 
                case "Http2FlowControlQueueOperationsExceeded": 
                    // Handler goes here
                    break; 
            } 
        } 

        private sealed class DummyDisposable : IDisposable 
        { 
            public static readonly IDisposable Instance = new DummyDisposable(); 
            private DummyDisposable() { } 
            public void Dispose() { } 
        } 
    } 
}

cyberfirst-developer commented 9 months ago

What about using something like a logger provider?

I will prefer context replacement, since logger don't have access to context itself, so there i can't access src ip for example and so on.

dotnet / aspnetcore