Closed RobSiklos closed 3 years ago
Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.
- If I use an intermediary HTTPS sniffer (like Fiddler), then the problem goes away.
... even just capturing the raw trace?
In .NET Core however, the error happens for any URL with a path under https://health-infobase.canada.ca/src/.
I feel like this is pointing to a misbehaving CRM/server or something
@Clockwork-Muse
- If I use an intermediary HTTPS sniffer (like Fiddler), then the problem goes away.
... even just capturing the raw trace?
It's weird. I'm using Fiddler4 btw. If I start Fiddler without decrypting HTTPS, then it works like 95% of the time, but there's still the odd failure. If I enable decrypting HTTPS in Fiddler, then it works all the time.
In .NET Core however, the error happens for any URL with a path under https://health-infobase.canada.ca/src/.
I feel like this is pointing to a misbehaving CRM/server or something
If it was a legitimate problem with the web server, then why does the issue only happen on Windows, but not Linux?
packet capture would be probably useful. I could not reproduce it ruing on Windows 10 with .NET 3.1. There was section with reordered packet - but that should not impact handshake IMHO -> should be dealt with at TCP layer and if fatal, we should get socket exception.
The behavior per URL seems strange as the SSL should not care. The only exception I'm aware if is renegotiation. One could configure IIS to require (or ask) client certificate for particular part and accessing that would cause renegotiation. That part is somewhat weird on windows and not broadly used.
cc: @vcsjones in case he has some other ideas.
Another customer ran in to something with a similar behavior when they had a TLS/SSL inspection device on their network that all HTTPS traffic was being proxied through.
Could there be anything like that on your network, or any other proxy? That might make sense sense why using Fiddler makes the problem go away, if Fiddler is now your proxy and not the TLS inspection device.
If it was a legitimate problem with the web server, then why does the issue only happen on Windows, but not Linux?
Because Windows and Linux use different underlying libraries to handle SSL, and a (slightly) misbehaving server could cause a problem to crop up on one and not the other. In this case, I'm more speculating that there's something about the destination network routing/server setup that OpenSSL is handling better. A static url part would be a simple/common place to do static routing off of.
Could there be anything like that on your network, or any other proxy? That might make sense sense why using Fiddler makes the problem go away, if Fiddler is now your proxy and not the TLS inspection device.
Inspection by Fiddler would "erase" the problem since it would have to decrypt/reencrypt the packets, and thus sanitizing whatever was causing the error in the first place (especially if it uses OpenSSL), regardless of whether it's network config on OP's end or the destination end.
Which is why a raw capture from a failing request would be interesting.
not that I would expect big difference right now, it would be also interesting if you can give it try with daily builds. https://github.com/dotnet/installer There were some fixes to cover some less common Schannel behaviors. Also the stack trace in exception should be more useful as the underlying implementation is based on Task rather than APM.
Fiddler, by default, chains to upstream proxy servers, so if you had an inspection device upstream, it'll still be used when you run Fiddler, typically.
One variable is that because Fiddler does decryption, it can change what TLS versions used between Client->Fiddler->Server. What versions do you have enabled in Tools > Fiddler Options > HTTPS?
This server only uses TLS/1.2: https://www.ssllabs.com/ssltest/analyze.html?d=health-infobase.canada.ca
See https://www.telerik.com/blogs/help!-running-fiddler-fixes-my-app- for other places where there are behavioral deltas (e.g. Fiddler doesn't do HTTP2, Fiddler changes connection reuse, etc.)
In particular, Fiddler's going to keep the upstream connection alive for reuse (if there wasn't a Connection:close) and thus not try to create a new connection (potentially with a session ticket) for a second request.
What is StartSendAuthResetSignal used for? (See e.g. https://community.developer.authorize.net/t5/Integration-and-Testing/A-call-to-SSPI-failed-function-requested-not-supported-in-WPF/td-p/60028 where enabling TLS/1.2 made an error in that area go away)
Could there be anything like that on your network, or any other proxy? That might make sense sense why using Fiddler makes the problem go away, if Fiddler is now your proxy and not the TLS inspection device.
I tried it from two different networks (home and office, no VPN) with the same result.
Which is why a raw capture from a failing request would be interesting.
@Clockwork-Muse Attached. I used Wireshark for the capture. There are two requests that I sent from my .NET app, and they are 5 seconds apart. You can see some weirdness in the 2nd set at the 5-second mark: health-infobase.capture.pcapng.zip
The second request is trying to use session resumption and it seems like the client fails immediately after receiving server response. Together with the message, it feels like some cache mismatch. You can try to disable the cache to see if that makes difference. Also on the platform difference, on Linux TLS resumption is currently not supported with .NET.
HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\MaximumCacheSize to 0
@wfurt I set MaximumCacheSize to 0 (dword) and rebooted, but the issue remains the same.
Do you see full handshake then? In the first case you posted, server sends back certificate in packet 8 but then in 552 it does not send certificate and finishes handshake with CipherChange/Resume.
I was able to get repro on different machine (different network and OS version). I will take a look.
Setting DisableClientExtendedMasterSecret to 1 fixes the issue for me. That is path the article does NOT recommend. Fiddling with the cipher suites seems tedious. However if the remote site is running buggy SSL I don't know what else to do.
However if the remote site is running buggy SSL I don't know what else to do.
Contact Government of Canada so they fix the issue? Good luck with that :)
I did not give up yet. We should verify that this is the issue. Unfortunately, Windows implementation does not support CipherSuitesPolicy
so it is not possible to manipulate the cipher list from managed code.
Contact Government of Canada so they fix the issue? Good luck with that :)
Maybe point out that their version of OpenSSL has some serious issues leading to an "F" grade?
Contact Government of Canada so they fix the issue? Good luck with that :)
Maybe point out that their version of OpenSSL has some serious issues leading to an "F" grade?
That might actually work.
I sent an email to the maintainer of that site.
However, if there are web sites out there which behave like this, and OpenSSL handles it gracefully, shouldn't .NET/Widows do so as well?
it should, no doubt @RobSiklos. Did you check the steps from the linked article if that makes it work for you? The fix may come from OS update instead of .NET itself.
Yes - setting DisableClientExtendedMasterSecret to 1 fixes the issue for me as well, however as you say, this is not a recommended approach. On the client side, I am up to date in terms of O/S (Windows 10). On the server side, I have no idea because I'm not affiliated with the government, but either way, asking them to update their O/S may not be practical (government == slow).
You can go down the recommended path and fiddle with cipher suite list. I'll ping the Schannel team tomorrow to also confirm and maybe provide more guidance.
Triage: Looks like OS bug - waiting on confirmation.
FYI, the maintainer of the web site responded. They fixed some stuff, and now receive an A+ rating on SSL Labs. However, the issue still happens.
This issue has been automatically marked as no recent activity because it has been marked as needing more info but has not had any activity for 14 days. It will be closed if no further activity occurs within 7 more days.
@danmosemsft, another example.
hello @RobSiklos, can you check this again? I put it on hold for little while and now I cannot reproduce it anymore. Either the server changed or I got updates from windows. (2004 build 20279.1)
@wfurt Just tried it again, and unfortunately I CAN still reproduce the problem.
@aik-jahoda will take another look. We should figure out if there is fix in newer windows. This still looks like OS issue.
Works without an issue on core 3.1. I'm on Windows insider 21313. @RobSiklos what version of OS do you use?
@aik-jahoda Unfortunately, I can no longer reproduce the problem. However, I did upgrade my OS to "Windows 10, version 20H2" (OS Build 19042.746) since I tested previously. So either it's fixed on the client OS side, or the maintainer of the web site did something which prevents the issue from happening in the first place.
I think this is Windows bug so getting fix through Windows update make sense to me. As this is really not actionable at .NET, I'm include do close this issue @RobSiklos.
@wfurt Agreed. Thanks for your help.
I'm using .NET to download data from a URL. For most URLs it works no problem, but for one specific URL, I am getting a very weird error when I try to make the connection. Furthermore, the error only happens on the 2nd (and subsequent) attempts to make the request. The first time always seems to work.
The problem is specific to Windows. On Linux it all seems to work fine.
Here is some sample code which demonstrates the problem:
Notes:
WebRequest
andWebClient
WebClient.DownloadData()
, but it still occurs when usingWebClient.OpenRead()
ServicePointManager.ServerCertificateValidationCallback
always returnstrue
) does not help.The stack trace when I run in .NET Core looks like this:
On .NET Framework, the stack trace seems to be much less useful:
HttpClient We also tried with
HttpClient
, and the results are similar. The first HttpClient works, but any subsequent ones fail:Note: originally posted at https://stackoverflow.com/q/64283848/270348