dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

Questions about requirements for the new SocketsHttpHandler #58814

Closed DenisRumyantsev closed 2 years ago

DenisRumyantsev commented 3 years ago

Hi, we maintain the Azure DevOps pipeline agent and we quite frequently encounter SSL issues that can be resolved by switching from the new SocketsHttpHandler to the previous HttpClientHandler by changing the agent configuration via setting the System.Net.Http.UseSocketsHttpHandler to false. There is an example of such an issue on Linux:

System.Net.Http.HttpRequestException:
The SSL connection could not be established, see inner exception.

System.Security.Authentication.AuthenticationException:
The remote certificate is invalid according to the validation procedure.

The questions are, is there some difference in requirements for new and legacy HTTP handlers? Are there some specific requirements for the new SocketsHttpHandler? Is there any documentation somewhere about the difference between these handlers?

ghost commented 3 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
Hi, we maintain the [Azure DevOps pipeline agent](https://github.com/microsoft/azure-pipelines-agent) and we quite frequently encounter SSL issues that can be resolved by switching from the new [SocketsHttpHandler](https://docs.microsoft.com/en-us/dotnet/api/system.net.http.socketshttphandler?view=net-5.0) to the previous [HttpClientHandler](https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httpclienthandler?view=net-5.0) by changing the agent configuration via setting the `System.Net.Http.UseSocketsHttpHandler` to false. There is an example of such an issue on Linux: ``` System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception. System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure. ``` The questions are, is there some difference in requirements for new and legacy HTTP handlers? Are there some specific requirements for the new `SocketsHttpHandler`? Is there any documentation somewhere about the difference between these handlers?
Author: DenisRumyantsev
Assignees: -
Labels: `area-System.Net.Http`, `untriaged`
Milestone: -
karelz commented 3 years ago

@DenisRumyantsev there is no documentation. The above would be bugs and things we would have to track down and root cause. They may be fixed in latest version of .NET Core -- are you in position to try .NET 6? (it will have go-live RC1 published in a week)

Note that the above opt-out is not available on .NET 5: https://docs.microsoft.com/en-us/dotnet/api/system.net.http.socketshttphandler?view=net-5.0#remarks Which version of .NET Core do you run on? .NET Core 3.1?

wfurt commented 3 years ago

Note that the validation is like HttpClient -> SslStream -> X509Chain. The validation calls OS functions so by default it would trust to what ever OS would trust. I strongly suggest two use custom validation callback, at least dump errors - if any and report specific issues.

On a side note, there were also some notes in the Agent about authentication. It is in similar situation and reliance on old platform handlers will not work. If there are specific problems it should be reported so we can look at it.

karelz commented 3 years ago

@DenisRumyantsev any update?

DenisRumyantsev commented 3 years ago

@karelz We have more than one issue that can be solved by switching from the SocketsHttpHandler to the HttpClientHandler. After testing on one of such issues I can confirm that this error is equally reproduced on the pipeline agent with the current version of .NET Core (with the 3.1.0 runtime version) and with the version of .NET Core 6 (with the 6.0.0-preview.7.21377.19 runtime version).

wfurt commented 3 years ago

can you file issue with details @DenisRumyantsev?

DenisRumyantsev commented 3 years ago

@wfurt, the error occurs here, where the Azure DevOps pipeline agent tries to pass the authentication (during its configuring on macOS) when it connects to the server (deployed on another machine in the local network) via credentials (login and password). The error can be reproduced on the agent with .NET Core 3 (currently used version), as well as with .NET Core 6. As a workaround, it can be fixed by switching from the new socketsHttpHandler to the legacy httpClientHandler since we still use .NET Core 3. But if we will update the .NET Core version in the future, then there is no way to fix this issue since the switching will not be available anymore. There is the stack trace of the error:

Microsoft.VisualStudio.Services.Common.VssUnauthorizedException: VS30063: You are not authorized to access http://{server_address}.
   at Microsoft.VisualStudio.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.VisualStudio.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
   at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
   at Microsoft.VisualStudio.Services.Location.Client.LocationHttpClient.GetConnectionDataAsync(ConnectOptions connectOptions, Int64 lastChangeId, CancellationToken cancellationToken, Object userState)
   at Microsoft.VisualStudio.Services.WebApi.Location.VssServerDataProvider.GetConnectionDataAsync(ConnectOptions connectOptions, Int32 lastChangeId, CancellationToken cancellationToken)
   at Microsoft.VisualStudio.Services.WebApi.Location.VssServerDataProvider.ConnectAsync(ConnectOptions connectOptions, CancellationToken cancellationToken)
   at Microsoft.VisualStudio.Services.Agent.LocationServer.ConnectAsync(VssConnection jobConnection)
wfurt commented 2 years ago

Can you get packet traces @DenisRumyantsev? Does it use NTLM or Kerberos ... or something else?

wfurt commented 2 years ago

BTW what version of .NET did you try?

DenisRumyantsev commented 2 years ago

@wfurt issue persists for .NET Core 3 with the 3.1.0 runtime version and for .NET Core 6 with the 6.0.0-preview.7.21377.19 runtime version. NTLM is used. Tried to use Fiddler to track requests, and it looks like they are not reaching the server. There are agent logs when it tries to connect:

[2021-09-23 10:02:49Z INFO VisualStudioServices] Starting operation Location.GetConnectionData
[2021-09-23 10:02:49Z WARN VisualStudioServices] Authentication failed with status code 401.
X-TFS-ProcessId: 9e66cb5f-eab8-46ce-a705-f69babc11819
ActivityId: c04f7ac0-fedd-4cd4-ab78-024c8702613e
X-TFS-Session: d8a99759-6466-4fb1-8943-f70f5dda215c
X-VSS-E2EID: 094224ee-6837-4f42-b863-527327e39dcd
WWW-Authenticate: Bearer, Basic realm="http://{server_address}/", Negotiate, NTLM
X-Powered-By: ASP.NET
P3P: CP="CAO DSP COR ADMa DEV CONo TELo CUR PSA PSD TAI IVDo OUR SAMi BUS DEM NAV STA UNI COM INT PHY ONL FIN PUR LOC CNT"
Lfs-Authenticate: NTLM
X-Content-Type-Options: nosniff
Date: Thu, 23 Sep 2021 10:02:49 GMT

[2021-09-23 10:02:49Z WARN VisualStudioServices] Windows issued token provider instance 50197559 requires an interactive prompt which is not allowed by the current settings
[2021-09-23 10:02:49Z ERR  VisualStudioServices] GET request to http://{server_address}/DefaultCollection/_apis/connectionData?connectOptions=0&lastChangeId=24577&lastChangeId64=24577 is not authorized. Details: VS30063: You are not authorized to access http://{server_address}.
[2021-09-23 10:02:49Z INFO VisualStudioServices] Finished operation Location.GetConnectionData

There is no such error with HttpClientHandler

DenisRumyantsev commented 2 years ago

There are no differences between requests that handlers make, except that if we call the _connection.ConnectAsync() method repeatedly, it will not make a new request with the legacy HttpClientHandler, since it has already been successfully executed. Unlike the call with the new SocketsHttpHandler, with which the method will fall with the same error — retries do not help.

wfurt commented 2 years ago

I don't understand the answer @DenisRumyantsev. Are you suggesting that the Handler is failing to connect? e.g. failing to create TCP or SslStream? It is also not clear to me what is the relation between Fiddler and macOS. Also from the log it is not clear if NTLM is actually used. It is last option on the list.

DenisRumyantsev commented 2 years ago

@wfurt we will continue to investigate this issue and provide more info when receiving it.

wfurt commented 2 years ago

ok, let me know when ready @DenisRumyantsev. In general, we are trying to hunt and fix differences like that .For macOS, stay on 6.0 - ntlm did not work properly there in previous releases. .

DenisRumyantsev commented 2 years ago

@wfurt here are the results of the investigation:

There is a difference in Negotiate Flags between failed (with .NET 5) and successful (with .NET 6) agent requests: Negotiate Always Sign and Negotiate Sign negotiate flags are missed for the failed one. Could this be the cause of the error? Used NTLM versions are the same. I can provide you with captured requests if needed.

Is it possible to make a patch for .NET 5 to fix the issue?

karelz commented 2 years ago

@DenisRumyantsev for patching .NET 5 we would have to fully understand the root cause and know which fix actually addressed the problem. Given that 6.0 is in RC1 now, with RC2 coming out soon (both are go-live licenses), would it be an option for you to use .NET 6? If not, why?

wfurt commented 2 years ago

Yes, you are right @DenisRumyantsev. This is https://github.com/dotnet/runtime/pull/54101 fixing very old #887. If needed, this would be pretty safe for servicing @karelz. (assuming we can justify the business impact)

DenisRumyantsev commented 2 years ago

@wfurt @karelz nice to hear you found this fix. It will be great if you provide a patch to .NET 5 since we plan to migrate gradually.

karelz commented 2 years ago

@DenisRumyantsev can you please confirm that latest .NET 6 builds (RC1+) work fine? They should contain the fix mentioned above and we should first confirm it really addresses your problem.

Regarding .NET 5 patch, can you please explain what is the business impact on you / your customers? How much does the issue hurt your product? Is earlier migration to .NET 6 an option for you? If not, why? Note: I assume you have migration planned in next ~6 months to align with support timelines of .NET 5: https://github.com/dotnet/core/tree/main/release-notes

DenisRumyantsev commented 2 years ago

@karelz @wfurt I confirm that .NET 6 RC1 works fine, without this issue. We have reached a decision that we will be moving towards migration to .NET 6. Thank you for clarifying this issue, I am closing it.