aspnet / KestrelHttpServer

[Archived] A cross platform web server for ASP.NET Core. Project moved to https://github.com/aspnet/AspNetCore
Apache License 2.0
2.63k stars 528 forks source link

Flaky Test: Test run aborted due to "Missing stream" in OnStreamCompleted #3079

Closed halter73 closed 5 years ago

halter73 commented 5 years ago
13:05:07   The active test run was aborted. Reason: Assertion Failed
13:05:07   Missing stream
13:05:07      at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Connection.Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.IHttp2StreamLifetimeHandler.OnStreamCompleted(Int32 streamId) in /_/src/Kestrel.Core/Internal/Http2/Http2Connection.cs:line 1067
13:05:07      at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Stream.OnRequestProcessingEnded() in /_/src/Kestrel.Core/Internal/Http2/Http2Stream.cs:line 79
13:05:07      at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
13:05:07      at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
13:05:07      at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
13:05:07      at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Connection.<>c__74`1.<StartStream>b__74_0(Object state) in /_/src/Kestrel.Core/Internal/Http2/Http2Connection.cs:line 1002
13:05:07      at System.Threading.ThreadPoolWorkQueue.Dispatch()

https://ci3.dot.net/job/aspnet_KestrelHttpServer/job/release_2.2/job/linux-Configuration_Debug_prtest/382/consoleFull

halter73 commented 5 years ago

I noticed that we are only doing _activeStreamCount--; inside the if (_streams.TryGetValue(... block.

https://github.com/aspnet/KestrelHttpServer/blob/2b87e7be8525ed7ae0aa57cbd94c3d1b522edaa3/src/Kestrel.Core/Internal/Http2/Http2Connection.cs#L1049-L1051

Since, as we just learned, TryGetValue can fail if RemoveDrainingStream has already been called from the request loop, this can put us in an inconsistent state. This could cause ungraceful server shutdowns, prevent keep-alive timeouts, and break the handling of client GOAWAY frames.

To fix this, we could simply move _activeStreamCount--; out of the if block.

@Tratcher @muratg Is this something we should look at fixing for 2.2?

muratg commented 5 years ago

@halter73 bar is much higher on 2.2 now. How much would this impact a realistic customer scenario in your opinion? How often would this happen for example?

My current feeling is that this would be more a patch candidate, but cc'ing @Eilon in case he has thoughts on this as well.

Eilon commented 5 years ago

Does this affect any scenario aside from tests?

Tratcher commented 5 years ago

_activeStreamCount is used for per-connection graceful shutdown. If it's not accurate then shutdown may take longer than needed and only end when timed out.

Tratcher commented 5 years ago

Hmm, this could also constitute a leak that would prevent you from creating new streams. https://github.com/aspnet/KestrelHttpServer/blob/2b87e7be8525ed7ae0aa57cbd94c3d1b522edaa3/src/Kestrel.Core/Internal/Http2/Http2Connection.cs#L982

halter73 commented 5 years ago

There's no reason to assume this issue only affects tests. The race occurs when the client sends an END_STREAM HTTP/2 frame right as the app completes.

Eilon commented 5 years ago

How absolutely confident are we that the suggested fix is the right one?

halter73 commented 5 years ago

I'm 100% sure we should make the suggested fix. Let's see if @Tratcher backs me up.

Eilon commented 5 years ago

If you guys agree, then go ahead and get it in ASAP.

Tratcher commented 5 years ago

I agree, this is the only place _activeStreamCount is decremented and it shouldn't get skipped.

halter73 commented 5 years ago

The fix is now in.

@natemcmaster