Open davidni opened 3 years ago
Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.
Author: | davidni |
---|---|
Assignees: | - |
Labels: | `area-System.Net.Http`, `untriaged` |
Milestone: | - |
Can you try with a recent .NET 6 build?
There are a bunch of situations that can cause us to try to read the content stream twice. If the content doesn't support this, it can throw and we should fail the request. There's not really anything else we can do at that point.
More info:
Things that can cause the content stream to be read twice include stuff like redirects and authentication. We will handle specific status codes like 302 or 401 and issue another request and try to send the content again. Expect: 100-continue can be used to try to avoid having to read the content twice in cases like this, because hopefully we will get the 302 or 401 before proceeding to send content; but obviously this only will work if the expect timeout doesn't expire.
Are you using redirects or authentication here?
We do have logic to retry requests on connection shutdown -- that's what HttpConnectionPool.SendWithRetryAsync is about. But this shouldn't cause the content stream to be read twice. Once we start sending any content, the request is no longer retryable.
Thanks Geoff.
1: Unfortunately I do not have a minimal repro of this atm and I am unable to verify on .NET 6 (we will not transition our production workloads before GA).
2: There are no redirects nor authentication at play here. Connection shutdown is likely the trigger -- during the incident where we had this happen, the destination service was overwhelmed and couldn't handle the load. This meant many of our requests were hitting the 1-second timeout for Expect:100-continue. And in at least a few cases, this did cause the content stream to be read twice. We have unambiguous evidence of this based on a few occurrences of the error I mentioned in the issue above ("Stream was already consumed"
). We have indirect evidence of this based on Kestrel logs with the following stack trace, indicating two concurrent calls to read the request content (look for StreamCopyHttpContent.SerializeToStreamAsync
below):
System.NotSupportedException: The ReadAsync method cannot be called when another read operation is pending.
at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
at System.IO.Pipelines.StreamPipeReader.ReadAsync(CancellationToken cancellationToken)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1ContentLengthMessageBody.ReadAsyncInternal(CancellationToken cancellationToken)
at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken)
at IslandGateway.Core.Service.Proxy.StreamCopier.CopyAsync(Stream source, Stream destination, CancellationToken cancellation)
at IslandGateway.Core.Service.Proxy.StreamCopyHttpContent.SerializeToStreamAsync(Stream stream, TransportContext context)
at System.Net.Http.HttpContent.CopyToAsyncCore(ValueTask copyTask)
at System.Net.Http.Http2Connection.Http2Stream.SendRequestBodyAsync(CancellationToken cancellationToken)
at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
at IslandGateway.Core.Service.Proxy.HttpProxy.<>c__DisplayClass13_0.<<ExecuteRequestAsync>b__0>d.MoveNext()
Ahhhh, this is HTTP2. I had assumed HTTP/1.1.
Yes, with HTTP2, there are additional scenarios. When we receive a GOAWAY (server-initiated connection shutdown), the server tells us which streams will be processed and which won't; the ones that haven't are retryable, even if we already started sending the content. So in this case you will definitely see us try to read the content twice.
That said, this is expected behavior. If you can't support reading the content again, then you should throw and the request should be failed.
@Tratcher for your take re. YARP. If confirmed this is acceptable, we would at least want to make sure we are handling it correctly and emitting the right data to telemetry. E.g. perhaps this should count as a Transport error (?)
You can also consider increasing the Expect100ContinueTimeout so that you'll hit this less frequently.
We have indirect evidence of this based on Kestrel logs with the following stack trace, indicating two concurrent calls to read the request content
This part's concerning, it means SerializeToStreamAsync is called concurrently on two threads and the Started
check in YARP isn't thread safe. The two concurrent threads make sense if the retry is happing at the same time as the 100-continue expires. @geoffkizer that sounds like a potential race in the client logic. If the 100-continue expired and is sending content then the retry should wait for that to abort.
This could be mitigated by making the Started check in YARP thread safe. I'll file an issue for that (https://github.com/microsoft/reverse-proxy/issues/1068).
As for the retry scenario in general, yes that's expected to fail in YARP and it should be reported accordingly. What type of failure is it currently reported as? It might depend on if it sent the whole request body before retrying, or if it was aborted part way through.
Raising the 100-continue timeout is a valid mitigation and shouldn't affect responsive servers.
@geoffkizer that sounds like a potential race in the client logic. If the 100-continue expired and is sending content then the retry should wait for that to abort.
Looking at the code, we should be waiting for SerializeToStreamAsync to complete before we actually fail the request. So I don't see how we could end up with concurrent SerializeToStreamAsync calls here. (I'm looking at 6.0 code, but I don't think this has changed much since 3.1.)
Note that in cases like this, where we get a GOAWAY or similar and need to cancel sending the request body, we will do two things:
(1) Cancel the cancellation token passed to SerializeToStreamAsync (2) Throw on any attempt to use the request body stream passed to SerializeToStreamAsync
(1) Cancel the cancellation token passed to SerializeToStreamAsync
@geoffkizer there's no cancellation parameter here, can you clarify what you're referring to? This may be the issue at hand. We really are making no effort to stop the first SerializeToStreamAsync, because there doesn't seem to be any signal we should be reacting to.
public abstract class HttpContent : IDisposable
{
//...
protected abstract Task SerializeToStreamAsync(Stream stream, TransportContext context);
}
(2) Throw on any attempt to use the request body stream passed to SerializeToStreamAsync
The failure we are experiencing happens before we attempt to write to the request body stream. We were still waiting to read bytes from the other side before proxying to the destination.
SerializeToStreamAsync added a new cancellable overload in 5.0. YARP hasn't implemented it yet since we're still supporting 3.1. https://github.com/dotnet/runtime/blob/c14ac488cc5f7d2ad71e7f81d6a05a1da8c76237/src/libraries/System.Net.Http/src/System/Net/Http/HttpContent.cs#L354
Wouldn't we throw it away in YARP anyway? Right now we have a comment in the code about it
Note we do not need to implement the SerializeToStreamAsync(Stream stream, TransportContext context, CancellationToken token) overload because the token it provides is the short request token, not the long body token passed from the constructor. Using the short token would break bidirectional streaming scenarios.
Note that in cases like this, where we get a GOAWAY or similar and need to cancel sending the request body, we will do two things:
(1) Cancel the cancellation token passed to SerializeToStreamAsync
@geoffkizer @MihaZupan is this a linked token? What are the possible triggers for it?
@MihaZupan did you get chance to look at it? It is not clear why untriaged label was removed.
From reading the code, it looks like it is possible that the request content will be read multiple times concurrently on HTTP 1:
expect100Timer
_startedSendingRequestBody == false
, sets _canRetry = true
and throws an IO exceptionallowExpect100ToContinue
with true and starts reading the request contentcatch
, sees sendRequestContentTask
hasn't completed, but doesn't await itThis could cause the exception seen in https://github.com/dotnet/aspnetcore/issues/33409#issuecomment-859829081.
Judging from the stack trace seen by David, I suspect a similar race exists for HTTP/2.
@geoffkizer If we're saying that the content may be read multiple times, are we also saying that those reads may happen concurrently?
If so, we should fix cases like StreamContent
that aren't using a thread-safe check.
If not, we should fix races in Http*Connection
.
@geoffkizer If we're saying that the content may be read multiple times, are we also saying that those reads may happen concurrently?
No, the read should never happen concurrently.
Your analysis seems correct to me.
Seems like we need to possibly do a couple things here.
The primary one is, we should wait for the sendRequestContentTask to complete before we propagate the exception upward. That should ensure we don't read concurrently from the request content.
However, doing that may introduce some new issues. In order to ensure the sendRequestContentTask completes promptly (or indeed at all), we need to cancel it. Unlike HTTP2, we don't seem to have logic to do this for HTTP1.1; I think we should.
Judging from the stack trace seen by David, I suspect a similar race exists for HTTP/2.
It's certainly possible, but the HTTP2 code is pretty different so we should investigate and confirm
One thing I noticed from the stack trace you linked is, the derived HttpContent class doesn't implement the SerializeToStreamAsync overload with a cancellation token. That means that even if we try to cancel sending the request body, it won't necessarily be cancelled promptly. I'm not sure this matters or not in terms of reproducing the issue -- I can't see how it would matter, but maybe I'm missing something.
As an aside, this all makes me wish we had common Expect: 100-Continue logic across different HTTP versions. We now have this implemented in three different places, unfortunately.
Triage: There is workaround (lock in custom HttpContent), so not critical for 6.0. But we should fix it -- i.e. make sure we don't start 2nd read before fully cancelling the 1st one.
Actually I think there's an even more basic issue here:
Thread A reads an EOF from the server, sees _startedSendingRequestBody == false, sets _canRetry = true and throws an IO exception Timer wakes up thread B, signals allowExpect100ToContinue with true and starts reading the request content
We shouldn't be both setting _canRetry = true and actually starting to send the body. We either need to prevent the body from sending when _canRetry is set to true, or if we can't (i.e. we lose the race), then don't set _canRetry to true.
Edit: And if we did this, we would prevent the concurrent read because we'd never retry if we started sending the content.
We recently started seeing this issue on HTTP/2 even without Expect:100-continue
. Interestingly, it happened for both GET's and POST's. It seems likely that this could be related to timing issues between the HttpClient stack and the HTTP/2 server that it is talking to. See e.g. https://github.com/microsoft/reverse-proxy/issues/711#issuecomment-801103475
Example callstack of the recent occurrences is below (we had >400 hits of this in a few hours):
System.Net.Http.HttpRequestException: An error occurred while sending the request.
---> System.InvalidOperationException: Stream was already consumed.
at IslandGateway.Core.Service.Proxy.StreamCopyHttpContent.SerializeToStreamAsync(Stream stream, TransportContext context)
at System.Net.Http.HttpContent.CopyToAsyncCore(ValueTask copyTask)
at System.Net.Http.Http2Connection.Http2Stream.SendRequestBodyAsync(CancellationToken cancellationToken)
at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
at Microsoft.PowerApps.CoreFramework.Communication.Http.HttpClientCorrelationHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at IslandGateway.Core.Service.Proxy.HttpProxy.<>c__DisplayClass13_0.<<ExecuteRequestAsync>b__0>d.MoveNext()
As I mentioned above -- there are various scenarios where we will try to send the content twice, and if the content doesn't support this, it will fail.
It should never try to send the content concurrently, but as long as it's not, then trying to send the content twice is expected in the scenarios discussed above.
BTW I assume this is still .NET Core 3.1? I.e. no code has changed on the client here?
Correct, still .NET Core 3.1. Likely the only thing that changed was request timing patterns as well as the destination services we are routing to.
Hi @geoffkizer , @Tratcher , we meet the exact same issue. We are using .NET Core 3.1 and HTTP/2. May I ask is there any recommended action to mitigate this issue? Or what kind of error is recommended to return to user for this case?
Triage: This problem exists already in 6.0, so not a regression. Only 1 customer report which has workaround in YARP. We should look at it from reliability perspective, but given low impact on customers, moving to 8.0.
Triage: still limited customer impact, moving to future.
I think I figured out how concurrent calls to SerializeToStreamAsync
can occur with HTTP/2 as well:
Http2Stream.SendRequestBodyAsync
, storing the requestBodyTask
.SendRequestBodyAsync
starts waiting on WaitFor100ContinueAsync
and yields.Expect100ContinueTimeout
timer in WaitFor100ContinueAsync
fires. SendRequestBodyAsync
sees that the request cancellation hasn't been requested yet and is about to call HttpContent.SerializeToStreamAsync
in the background.REFUSED_STREAM
RST. Http2Connection decides that the request is retryable and calls Http2Stream.OnReset
with canRetry: true
.OnReset()
sets Http2Stream._canRetry
to true and calls Http2Stream.Cancel()
, which signals the waiter.ReadResponseHeadersAsync
task, which sees that the stream was reset and throws the retryable exception.Http2Connection.SendAsync
which was waiting on the request body or response headers to complete now wakes up and sees that the response headers task completed first.requestBodyTask
to complete, but sets up a continuation that will observe any exceptions.responseHeadersTask
is propagated out of SendAsync
and into HttpConnectionPool.SendWithVersionDetectionAndRetryAsync
which will now attempt to send the request again.HttpContent.SerializeToStreamAsync
is called.SendRequestBodyAsync
task (nr. 3) may just now be calling into SerializeToStreamAsync
at the same time.The gist is that we aren't waiting for requestBodyTask
to complete before returning a retryable exception from Http2Connection.SendAsync
, allowing SerializeToStreamAsync
to be called multiple times concurrently.
Triage: This problem exists already in 6.0, so not a regression. Only 1 customer report which has workaround in YARP. We should look at it from reliability perspective, but given low impact on customers, moving to 8.0.
@karelz we seem to observe the same behavior using YARP 2.2 and .NET Core 8. You were mentionning that 1 customer has a workaround, what is this workaround and how can we add it in our system?
Thanks!
@sberube I take it you're seeing the "Stream was already consumed."
error message?
See https://github.com/microsoft/reverse-proxy/issues/1683
That may happen during normal proxy operation if HttpClient decides to retry a request, something that YARP doesn't support as we don't buffer the request body.
This issue is tracking a bug in SocketsHttpHandler where those reads may happen concurrently. The workaround mentioned is mitigating just that aspect, and it's being used by YARP internally.
Description
Under very specific conditions, SocketsHttpHandler.SendAsync when using HTTP/2 attempts to copy the HttpRequestMessage.Content twice.
In Island Gateway (AKA what eventually became YARP), we have code similar to the following, and this invariant that we do a best-effort check is being violated:
https://github.com/microsoft/reverse-proxy/blob/961608d01adc9ab07d4cd3f5adf7d4d3dfff3f0f/src/ReverseProxy/Proxy/StreamCopyHttpContent.cs#L118-L125
The following criteria seem to be necessary to trigger this bug, though it may be broader than this:
Expect: 100-continue
100
response within the first 1 secondIt seems like the SocketsHttpHandler machinery may be deciding it is safe to retry because the destination service provably did not start processing the first attempt, but it seems to make the wrong assumption that it is okay to read the HttpRequestMessage.Content again on the next attempt. This is not a valid assumption, and leads to issues in applications such as YARP where the Content is a wrapper around Kestrel's incoming request body stream, which cannot be read twice.
Configuration
Other information
Filing this as a result of an offline discussion with @halter73 and @davidfowl. Thanks folks for helping investigate.
@karelz FYI.