dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.1k stars 4.7k forks source link

Very weird SSL error in .NET: The specified data could not be decrypted only for a specific URL #43682

Closed RobSiklos closed 3 years ago

RobSiklos commented 3 years ago

I'm using .NET to download data from a URL. For most URLs it works no problem, but for one specific URL, I am getting a very weird error when I try to make the connection. Furthermore, the error only happens on the 2nd (and subsequent) attempts to make the request. The first time always seems to work.

The problem is specific to Windows. On Linux it all seems to work fine.

Here is some sample code which demonstrates the problem:

string url = "https://health-infobase.canada.ca/src/data/covidLive/covid19.csv";

for (int i = 1; i <= 10; i++)
{
    var req = (HttpWebRequest)WebRequest.Create(url);

    // Just in case, rule these out as being related to the issue.
    req.AllowAutoRedirect = false;
    req.ServerCertificateValidationCallback = (object s, X509Certificate certificate, X509Chain chain, SslPolicyErrors sslPolicyErrors) => true;

    try
    {
        // This line throws the exception.
        using (req.GetResponse()) { }
    }
    catch (Exception ex) {
        Console.WriteLine(ex.ToString());
        Console.WriteLine($"Failed on attempt {i}.");
        return;
    }
}

Notes:

The stack trace when I run in .NET Core looks like this:

System.Net.WebException: The SSL connection could not be established, see inner exception. Authentication failed, see inner exception.
 ---> System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception.
 ---> System.Security.Authentication.AuthenticationException: Authentication failed, see inner exception.
 ---> System.ComponentModel.Win32Exception (0x80090330): The specified data could not be decrypted.
   --- End of inner exception stack trace ---
   at System.Net.Security.SslStream.StartSendAuthResetSignal(ProtocolToken message, AsyncProtocolRequest asyncRequest, ExceptionDispatchInfo exception)
   at System.Net.Security.SslStream.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.PartialFrameCallback(AsyncProtocolRequest asyncRequest)
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Security.SslStream.ThrowIfExceptional()
   at System.Net.Security.SslStream.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.Security.SslStream.EndProcessAuthentication(IAsyncResult result)
   at System.Net.Security.SslStream.EndAuthenticateAsClient(IAsyncResult asyncResult)
   at System.Net.Security.SslStream.<>c.<AuthenticateAsClientAsync>b__65_1(IAsyncResult iar)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at System.Net.HttpWebRequest.SendRequest()
   at System.Net.HttpWebRequest.GetResponse()
   --- End of inner exception stack trace ---
   at System.Net.HttpWebRequest.GetResponse()
   at UserQuery.Main() in C:\Users\robs\AppData\Local\Temp\LINQPad6\_gifldqtg\xltrxu\LINQPadQuery:line 12

On .NET Framework, the stack trace seems to be much less useful:

System.Net.WebException: The request was aborted: Could not create SSL/TLS secure channel.
   at System.Net.HttpWebRequest.GetResponse()
   at UserQuery.Main() in C:\Users\robs\AppData\Local\Temp\LINQPad5\_psduzptv\dcrjhq\LINQPadQuery.cs:line 48

HttpClient We also tried with HttpClient, and the results are similar. The first HttpClient works, but any subsequent ones fail:

string url = "https://health-infobase.canada.ca/src/data/covidLive/covid19.csv";

using (var client = new HttpClient())
{
    // Works
    string response = client.GetAsync(url).Result.Content.ReadAsStringAsync().Result;
}

using (var client = new HttpClient())
{
    // Fails
    string response = client.GetAsync(url).Result.Content.ReadAsStringAsync().Result;
}

Note: originally posted at https://stackoverflow.com/q/64283848/270348

ghost commented 3 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Clockwork-Muse commented 3 years ago
  • If I use an intermediary HTTPS sniffer (like Fiddler), then the problem goes away.

... even just capturing the raw trace?

In .NET Core however, the error happens for any URL with a path under https://health-infobase.canada.ca/src/.

I feel like this is pointing to a misbehaving CRM/server or something

RobSiklos commented 3 years ago

@Clockwork-Muse

  • If I use an intermediary HTTPS sniffer (like Fiddler), then the problem goes away.

... even just capturing the raw trace?

It's weird. I'm using Fiddler4 btw. If I start Fiddler without decrypting HTTPS, then it works like 95% of the time, but there's still the odd failure. If I enable decrypting HTTPS in Fiddler, then it works all the time.

In .NET Core however, the error happens for any URL with a path under https://health-infobase.canada.ca/src/.

I feel like this is pointing to a misbehaving CRM/server or something

If it was a legitimate problem with the web server, then why does the issue only happen on Windows, but not Linux?

wfurt commented 3 years ago

packet capture would be probably useful. I could not reproduce it ruing on Windows 10 with .NET 3.1. There was section with reordered packet - but that should not impact handshake IMHO -> should be dealt with at TCP layer and if fatal, we should get socket exception.

The behavior per URL seems strange as the SSL should not care. The only exception I'm aware if is renegotiation. One could configure IIS to require (or ask) client certificate for particular part and accessing that would cause renegotiation. That part is somewhat weird on windows and not broadly used.

cc: @vcsjones in case he has some other ideas.

vcsjones commented 3 years ago

Another customer ran in to something with a similar behavior when they had a TLS/SSL inspection device on their network that all HTTPS traffic was being proxied through.

Could there be anything like that on your network, or any other proxy? That might make sense sense why using Fiddler makes the problem go away, if Fiddler is now your proxy and not the TLS inspection device.

Clockwork-Muse commented 3 years ago

If it was a legitimate problem with the web server, then why does the issue only happen on Windows, but not Linux?

Because Windows and Linux use different underlying libraries to handle SSL, and a (slightly) misbehaving server could cause a problem to crop up on one and not the other. In this case, I'm more speculating that there's something about the destination network routing/server setup that OpenSSL is handling better. A static url part would be a simple/common place to do static routing off of.

Could there be anything like that on your network, or any other proxy? That might make sense sense why using Fiddler makes the problem go away, if Fiddler is now your proxy and not the TLS inspection device.

Inspection by Fiddler would "erase" the problem since it would have to decrypt/reencrypt the packets, and thus sanitizing whatever was causing the error in the first place (especially if it uses OpenSSL), regardless of whether it's network config on OP's end or the destination end.

Which is why a raw capture from a failing request would be interesting.

wfurt commented 3 years ago

not that I would expect big difference right now, it would be also interesting if you can give it try with daily builds. https://github.com/dotnet/installer There were some fixes to cover some less common Schannel behaviors. Also the stack trace in exception should be more useful as the underlying implementation is based on Task rather than APM.

ericlaw1979 commented 3 years ago

Fiddler, by default, chains to upstream proxy servers, so if you had an inspection device upstream, it'll still be used when you run Fiddler, typically.

One variable is that because Fiddler does decryption, it can change what TLS versions used between Client->Fiddler->Server. What versions do you have enabled in Tools > Fiddler Options > HTTPS?

This server only uses TLS/1.2: https://www.ssllabs.com/ssltest/analyze.html?d=health-infobase.canada.ca

See https://www.telerik.com/blogs/help!-running-fiddler-fixes-my-app- for other places where there are behavioral deltas (e.g. Fiddler doesn't do HTTP2, Fiddler changes connection reuse, etc.)

In particular, Fiddler's going to keep the upstream connection alive for reuse (if there wasn't a Connection:close) and thus not try to create a new connection (potentially with a session ticket) for a second request.

What is StartSendAuthResetSignal used for? (See e.g. https://community.developer.authorize.net/t5/Integration-and-Testing/A-call-to-SSPI-failed-function-requested-not-supported-in-WPF/td-p/60028 where enabling TLS/1.2 made an error in that area go away)

RobSiklos commented 3 years ago

Could there be anything like that on your network, or any other proxy? That might make sense sense why using Fiddler makes the problem go away, if Fiddler is now your proxy and not the TLS inspection device.

I tried it from two different networks (home and office, no VPN) with the same result.

Which is why a raw capture from a failing request would be interesting.

@Clockwork-Muse Attached. I used Wireshark for the capture. There are two requests that I sent from my .NET app, and they are 5 seconds apart. You can see some weirdness in the 2nd set at the 5-second mark: health-infobase.capture.pcapng.zip

wfurt commented 3 years ago

The second request is trying to use session resumption and it seems like the client fails immediately after receiving server response. Together with the message, it feels like some cache mismatch. You can try to disable the cache to see if that makes difference. Also on the platform difference, on Linux TLS resumption is currently not supported with .NET.

HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\MaximumCacheSize to 0 
RobSiklos commented 3 years ago

@wfurt I set MaximumCacheSize to 0 (dword) and rebooted, but the issue remains the same.

wfurt commented 3 years ago

Do you see full handshake then? In the first case you posted, server sends back certificate in packet 8 but then in 552 it does not send certificate and finishes handshake with CipherChange/Resume.

wfurt commented 3 years ago

I was able to get repro on different machine (different network and OS version). I will take a look.

wfurt commented 3 years ago

https://support.microsoft.com/en-us/help/4528489/transport-layer-security-tls-connections-might-fail-or-timeout-when-co

Setting DisableClientExtendedMasterSecret to 1 fixes the issue for me. That is path the article does NOT recommend. Fiddling with the cipher suites seems tedious. However if the remote site is running buggy SSL I don't know what else to do.

abelykh0 commented 3 years ago

However if the remote site is running buggy SSL I don't know what else to do.

Contact Government of Canada so they fix the issue? Good luck with that :)

wfurt commented 3 years ago

I did not give up yet. We should verify that this is the issue. Unfortunately, Windows implementation does not support CipherSuitesPolicy so it is not possible to manipulate the cipher list from managed code.

ericlaw1979 commented 3 years ago

Contact Government of Canada so they fix the issue? Good luck with that :)

Maybe point out that their version of OpenSSL has some serious issues leading to an "F" grade?

abelykh0 commented 3 years ago

Contact Government of Canada so they fix the issue? Good luck with that :)

Maybe point out that their version of OpenSSL has some serious issues leading to an "F" grade?

That might actually work.

RobSiklos commented 3 years ago

I sent an email to the maintainer of that site.

However, if there are web sites out there which behave like this, and OpenSSL handles it gracefully, shouldn't .NET/Widows do so as well?

wfurt commented 3 years ago

it should, no doubt @RobSiklos. Did you check the steps from the linked article if that makes it work for you? The fix may come from OS update instead of .NET itself.

RobSiklos commented 3 years ago

Yes - setting DisableClientExtendedMasterSecret to 1 fixes the issue for me as well, however as you say, this is not a recommended approach. On the client side, I am up to date in terms of O/S (Windows 10). On the server side, I have no idea because I'm not affiliated with the government, but either way, asking them to update their O/S may not be practical (government == slow).

wfurt commented 3 years ago

You can go down the recommended path and fiddle with cipher suite list. I'll ping the Schannel team tomorrow to also confirm and maybe provide more guidance.

karelz commented 3 years ago

Triage: Looks like OS bug - waiting on confirmation.

RobSiklos commented 3 years ago

FYI, the maintainer of the web site responded. They fixed some stuff, and now receive an A+ rating on SSL Labs. However, the issue still happens.

ghost commented 3 years ago

This issue has been automatically marked as no recent activity because it has been marked as needing more info but has not had any activity for 14 days. It will be closed if no further activity occurs within 7 more days.

stephentoub commented 3 years ago

@danmosemsft, another example.

wfurt commented 3 years ago

hello @RobSiklos, can you check this again? I put it on hold for little while and now I cannot reproduce it anymore. Either the server changed or I got updates from windows. (2004 build 20279.1)

RobSiklos commented 3 years ago

@wfurt Just tried it again, and unfortunately I CAN still reproduce the problem.

wfurt commented 3 years ago

@aik-jahoda will take another look. We should figure out if there is fix in newer windows. This still looks like OS issue.

aik-jahoda commented 3 years ago

Works without an issue on core 3.1. I'm on Windows insider 21313. @RobSiklos what version of OS do you use?

RobSiklos commented 3 years ago

@aik-jahoda Unfortunately, I can no longer reproduce the problem. However, I did upgrade my OS to "Windows 10, version 20H2" (OS Build 19042.746) since I tested previously. So either it's fixed on the client OS side, or the maintainer of the web site did something which prevents the issue from happening in the first place.

wfurt commented 3 years ago

I think this is Windows bug so getting fix through Windows update make sense to me. As this is really not actionable at .NET, I'm include do close this issue @RobSiklos.

RobSiklos commented 3 years ago

@wfurt Agreed. Thanks for your help.