dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.91k stars 4.63k forks source link

Possible regression/behavior difference with legacy HTTP (Curl) handler on Ubuntu 18.04 between .NET Core 2.0 and 2.1 #27436

Closed juliobbv closed 4 years ago

juliobbv commented 5 years ago

We found another possible regression during the testing of our Azure Pipelines agent after upgrading it from .NET Core 2.0 to 2.1, this time on Ubuntu 18.04 and the legacy Curl handler.

In this particular scenario, the agent (i.e. the client) is attempting to connect to a local server hosted in IIS on Windows. The agent is using "Negotiate" authentication to connect to the server, which connects via HTTPS with self-signed root and server certificates.

The error message looks as follows:

[2018-09-18 17:56:46Z ERR  LocationServer] System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.Http.CurlException: Stream error in the HTTP/2 framing layer
   at System.Net.Http.CurlHandler.ThrowIfCURLEError(CURLcode error)
   at System.Net.Http.CurlHandler.MultiAgent.FinishRequest(StrongToWeakReference`1 easyWrapper, CURLcode messageResult)
   --- End of inner exception stack trace ---
   at Microsoft.VisualStudio.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
   at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
   at Microsoft.VisualStudio.Services.Location.Client.LocationHttpClient.GetConnectionDataAsync(ConnectOptions connectOptions, Int64 lastChangeId, CancellationToken cancellationToken, Object userState)
   at Microsoft.VisualStudio.Services.WebApi.Location.VssServerDataProvider.ConnectAsync(ConnectOptions connectOptions, CancellationToken cancellationToken)
   at Microsoft.VisualStudio.Services.Agent.LocationServer.ConnectAsync(VssConnection jobConnection) in /home/julio/vsts-***/src/Microsoft.VisualStudio.Services.Agent/LocationServer.cs:line 31

The scenario above works fine with an agent built against .NET Core 2.0 (with all other variables being the same, including using the Curl handler). Both agents (2.0, 2.1) are run from the same Ubuntu 18.04 machine, with the latest compatible libcurl3 version available at this moment in time (7.58.0-2ubuntu2).

I'm attaching a pair of HTTP trace logs for reference: corefx2.zip. One interesting difference between the logs is that the client with .NET Core 2.0 prefers to use HTTP 1.1 to talk to the server, whereas .NET Core 2.1 always uses HTTP 2 instead.

Let @TingluoHuang and I know if you have any follow-up questions. Thanks in advance! 😃

Edit: we also first tried to use the new managed SocketsHttpHandler in Linux, but we unfortunately hit the issue that davidsh describes in dotnet/runtime#25827 (i.e. "No support for channel binding on operating systems other than Windows.").

karelz commented 5 years ago

Are you able to debug it (I assume you have isolated repro) on 2.1 vs. 2.0? In general, we want to phase out Curl support (blocked on missing HTTP/2). If there is however a clear regression and SocketsHttpHandler cannot be used (or easily fixed), we would take a fix for the regression.

How much does it impact your product? Is it important scenario for you?

stephentoub commented 5 years ago

Are you able to debug it (I assume you have isolated repro) on 2.1 vs. 2.0?

Sounds to me like the issue is what @juliobbv called out here: "One interesting difference between the logs is that the client with .NET Core 2.0 prefers to use HTTP 1.1 to talk to the server, whereas .NET Core 2.1 always uses HTTP 2 instead."

.NET Core 2.1 changed the default message version to be 2.0 instead of 1.1 (which I think was a mistake for us to do, and we should consider undoing that for 2.2 or even 2.1.x servicing). This means that an app that was using HTTP/1.1 will now use HTTP/2 by default if using WinHttpHandler/CurlHandler and the server it's talking to supports it, and if there's a bug on either side of the communication, the client will start failing.

@juliobbv, are you able to try with 2.0 setting the HttpRequestMessage.Version to 2.0 explicitly and seeing if that yields a similar error? My guess is it will, all else being equal (e.g. same machine configuration). And conversely, on 2.1 are you able to try setting the HttpRequestMessage.Version explicitly to 1.1, in order to override the 2.0 default, and see if that fixes the error?

juliobbv commented 5 years ago

Hi @stephentoub, I forced the HttpRequestMessage version of my 2.1 .NET Core build to 1.1, and I can confirm that it fixed the error. And vice versa, setting the HTTP version to 2.0 on my 2.0 build caused the 'Stream error in the HTTP/2 framing layer' to show up.

karelz commented 5 years ago

Ah, I missed the HTTP/2 reference during my previous reply, sorry for that. It seems that the behavior is the same on both .NET Core 2.0 and 2.1 - probably due to buggy HTTP/2 server? Switching to Http/1.1 protocol seems to work - so I guess we can close the issue?

juliobbv commented 5 years ago

@karelz, to answer your questions:

How much does it impact your product? Is it important scenario for you?

We have a few customers that have Azure Pipeline agents (i.e. our product) configured against an on-prem TFS server with HTTPS self-signed certificates.

Proportionally, there are many more customers that use regular HTTP. for on-prem TFS instances. Our team decided to not block on a new agent release with .NET Core 2.1, because the benefits (ARM Linux support), overweighed the disadvantages (i.e. this bug). But still, it's under our best interests for this regression to be fixed. 😄

stephentoub commented 5 years ago

probably due to buggy HTTP/2 server?

My guess would be more likely due to a libcurl bug, but yeah, it could be either. @juliobbv, you could see if building libcurl from the latest source and installing that resolves the issue; maybe it's a known issue with whatever libcurl is on that system.

Switching to Http/1.1 protocol seems to work - so I guess we can close the issue?

@karelz, this highlights the dangers of our having switched the default from 1.1 to 2.0. I think we should re-examine that.

karelz commented 5 years ago

@juliobbv are you able to work around the problem by forcing use of Http11 protocol?

this highlights the dangers of our having switched the default from 1.1 to 2.0. I think we should re-examine that.

Understood - we took it as calculated risk. Let's see first if there are no workarounds (fixing SocketsHttpHandler or using Http11).

juliobbv commented 5 years ago

@juliobbv are you able to work around the problem by forcing use of Http11 protocol?

This is the path that we ended up taking. The PR referenced above makes the changes to use HTTP 1.1 on operating systems that use Curl as the backend. So we should be good now from our team.

caesar-chen commented 5 years ago

The remaining work is tracked in dotnet/runtime#25827, since we will not investigate Curl specific HTTP2 issue as it will be replaced by SocketsHttpHandler, which will have HTTP2 support.

I think this issue can be closed.

davidsh commented 5 years ago

The remaining work is tracked in dotnet/runtime#25827, since we will not investigate Curl specific HTTP2 issue as it will be replaced by SocketsHttpHandler, which will have HTTP2 support. I think this issue can be closed.

dotnet/runtime#25827 is an issue about NTLM authentication and I don't think is related at all to this issue.

However, I agree that Curl specific HTTP/2 issues aren't something we are focusing on since SocketsHttpHandler is the new default HTTP stack. And since there is a workaround for this issue (as described above), we can close this issue now.