Closed halter73 closed 6 years ago
I noticed that we are only doing _activeStreamCount--;
inside the if (_streams.TryGetValue(...
block.
Since, as we just learned, TryGetValue can fail if RemoveDrainingStream has already been called from the request loop, this can put us in an inconsistent state. This could cause ungraceful server shutdowns, prevent keep-alive timeouts, and break the handling of client GOAWAY frames.
To fix this, we could simply move _activeStreamCount--;
out of the if block.
@Tratcher @muratg Is this something we should look at fixing for 2.2?
@halter73 bar is much higher on 2.2 now. How much would this impact a realistic customer scenario in your opinion? How often would this happen for example?
My current feeling is that this would be more a patch candidate, but cc'ing @Eilon in case he has thoughts on this as well.
Does this affect any scenario aside from tests?
_activeStreamCount is used for per-connection graceful shutdown. If it's not accurate then shutdown may take longer than needed and only end when timed out.
Hmm, this could also constitute a leak that would prevent you from creating new streams. https://github.com/aspnet/KestrelHttpServer/blob/2b87e7be8525ed7ae0aa57cbd94c3d1b522edaa3/src/Kestrel.Core/Internal/Http2/Http2Connection.cs#L982
There's no reason to assume this issue only affects tests. The race occurs when the client sends an END_STREAM HTTP/2 frame right as the app completes.
How absolutely confident are we that the suggested fix is the right one?
I'm 100% sure we should make the suggested fix. Let's see if @Tratcher backs me up.
If you guys agree, then go ahead and get it in ASAP.
I agree, this is the only place _activeStreamCount is decremented and it shouldn't get skipped.
The fix is now in.
@natemcmaster
https://ci3.dot.net/job/aspnet_KestrelHttpServer/job/release_2.2/job/linux-Configuration_Debug_prtest/382/consoleFull