Closed MihaZupan closed 2 years ago
Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.
Author: | MihaZupan |
---|---|
Assignees: | - |
Labels: | `bug`, `area-System.Net.Http` |
Milestone: | - |
Presumably the fix here is to make sure we capture the User-Agent from the original request before we actually put the request in the queue (and make it available for other connections to grab), is that right?
Yes, that's right.
Somewhat related here: We are forcing the User-Agent to be parsed here because we use TryGetValues. We should avoid this by using NonValidated here.
(I thought there was an issue on this but I couldn't find it...)
Triage: Impact on reliability with proxies, we should address it. Hopefully not too complex.
This has been mitigated by #68115 in 7.0 - you should only hit this issue if the request message's headers are modified after it was sent.
Moving to future.
What's the workaround in NET 6? The workaround with CustomHandler below did not work:
public sealed class CustomHandler : DelegatingHandler
{
public CustomHandler(HttpMessageHandler innerHandler) : base(innerHandler)
{
}
protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
_ = request.Headers.TryGetValues("User-Agent", out _); // Force parsing
return base.SendAsync(request, cancellationToken);
}
}
We are having the IndexOutOfRangeException as you mentioned in your original post: https://github.com/dotnet/runtime/issues/61798#issuecomment-1040227700 After spending 2 days as a team trying to solve it, we wrapped it all in try catch and retrying...
@Edgaras91 can you please post the stack trace where you are seeing these exceptions?
Are you setting SocketsHttpHandler.ConnectCallback
?
Are you reusing and/or modifying the HttpRequestMessage
after you send it?
The workaround should be 100% effective as long as you answer "no" to the above.
Stack trace:
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at System.Net.Http.Headers.HttpHeaders.ReadStoreValues[T](Span`1 values, Object storeValue, HttpHeaderParser parser, Int32& currentIndex)
at System.Net.Http.Headers.HttpHeaders.GetStoreValuesAsStringOrStringArray(HeaderDescriptor descriptor, Object sourceValues, String& singleValue, String[]& multiValue)
at System.Net.Http.Headers.HttpHeaders.GetEnumeratorCore()+MoveNext()
at System.Linq.Enumerable.SelectManySingleSelectorIterator`2.MoveNext()
at System.Linq.Enumerable.SelectEnumerableIterator`2.ToList()
at RestSharp.RestResponse.<>c__DisplayClass0_0.<<FromHttpResponse>g__GetDefaultResponse|0>d.MoveNext()
No ConnectCallback, not re-using HttpRequestMessage. We are however using a 3rd party wrapper "RestSharp", but your workaround is hitting the breakpoint at Headers.TryGetValues
.
I will search if this "RestSharp" is doing any of the 2 things you mentioned.
but your workaround is hitting the breakpoint at Headers.TryGetValues.
What do you mean by this?
You are definitely hitting #61798, but if the workaround does not work 100%, you are hitting something other than this specific issue (#65379).
Are you using NewRelic? Using an outdated version (either as a direct dependency, or running on the host machine) was a common cause we saw for #61798.
I am not using NewRelic. The workaround is not working.
By hitting breakpoint I meant I did implement a workaround successfully - but it doesn't work.
Because there is no other solution for this, please re-consider adding related fixes in NET 6. Our solutions cant make HTTP calls reliably.
It may be worth allowing posting comments on the other issues (https://github.com/dotnet/runtime/issues/61798) as there may be more people experiencing this.
I can also confirm that the RestSharp 107.3.0 (what I'm using) does create new HttpRequestMessage every time and there are no signs of SocketsHttpHandler.ConnectCallback
in the source code.
So seems like it just "doesn't work".
In your example, is below line mandatory, because we didn't have it in our attempt to use the workaround:
var socketsHttpHandler = new SocketsHttpHandler();
I can say, that we are only experiencing this issue on a server (release build), and not on local machines. Tried in debug and release builds. Is this linked with IIS somehow?
@Edgaras91 can you please open a new issue and share how you are making HTTP calls? Is this only occurring when using a proxy? If you have a reproducible example that would also help a ton to diagnose the issue. IIS does not impact HttpClient
's behavior.
Seeing this sort of exception means multiple threads are accessing the header collection concurrently. There are a few known cases (NewRelic, CONNECT tunnels ...), but it's possible something else is happening in your scenario.
Will this fix be backported to .NET 6?
@madelson can you please give us some numbers on how often it happens to you? How much it impacts your service? Potentially any business impact on your company or your customers. We will need the info to justify backport into 6.0.
Triage: Given that we might need to backport the fix to 6.0 (per discussion above - pending impact details), we should fix it in 7.0 (although the impact on 7.0 is much lower than on 6.0) to have the fix ready for potential backport to 6.0.x and to avoid servicing 7.0 if we decide in future to service 6.0.
@madelson can you also please confirm whether the workaround mentioned in the top issue (accessing the User-Agent
header before sending the request) fully resolves this issue for you (we expect it should)?
@MihaZupan thanks for following up. We've deployed the workaround and are actively monitoring. The error was transient, so we want to wait a week or so before declaring victory.
@MihaZupan I still have to do some more digging to confirm early next week, but so far it looks like the workaround may not have fixed the issue for us.
Can you share how you implemented the workaround? We would expect that it is a 100% reliable fix for this specific issue.
If it's not, it's possible you are running into a different header issue (e.g. using outdated NewRelic) that results in similar issues.
Notably, you shouldn't see HttpConnectionPool.EstablishProxyTunnelAsync
in the stack trace at all after the workaround.
@MihaZupan ok I think we have a false alarm; the error was in an environment that hadn't received the fix yet. I'll continue to monitor.
Here's what our implementation of the workaround looks like:
I also just want to confirm that this is a flavor of the error that should be fixed here:
The workaround looks good. And yes, that stack trace should be the same issue.
EDIT: thought we had a conclusion here but still looking.
In .NET 6, the creation of a connection has been decoupled from the initiating request. That is, a connection attempt may still be ongoing even after the initiating request was canceled or was served by a different connection that became available.
In the case of
HttpConnectionKind.SslProxyTunnel
(CONNECT tunnel), we copy the original request'sUser-Agent
header to the tunnel request here. The problem is that we access the header too late. At that point, the original request's header collection could be enumerated on a different thread (served by a different connection). Or, theSendAsync
call associated with the request already returned, seemingly returning ownership back to the user. The user may then attempt to enumerate/inspect the headers, which would again race with the read fromEstablishProxyTunnelAsync
.This is a possible explanation for an issue a user hit in https://github.com/dotnet/runtime/issues/61798#issuecomment-1040227700 based on the stack trace.
cc: @Strepto
This specific issue can be worked around by manually forcing the parsing of the
User-Agent
header: