Closed stephentoub closed 4 years ago
Related to https://github.com/aspnet/AspNetCore/issues/7081.
dotnet/runtime#15113 is going to change this code a bit; looking at perf here probably doesn't make sense until after that's done.
Do we know if dotnet/runtime#15113 is really funded (it is marked as 3.0, is someone working on it?)?
Note that the difference in perf is a ~700X. Do we have any reason to believe the Linux implementation is that much worse? My guess is that there is some caching that happens on Windows that is not happening on Linux.
It would be good to at least understand the rough architectural blocks involved. My guess is that dotnet/runtime#15113 (which is about status checking an X509 certificate) is probably independent (or maybe actually makes caching harder). Ideally we confirm or deny my guess above, and if we determine that it is caching and roughly independent of dotnet/runtime#15113 (and relatively easy), we should go to some trouble to do that first if dotnet/runtime#15113 is in any danger of being cut.
Do we know if dotnet/runtime#15113 is really funded (it is marked as 3.0, is someone working on it?)?
Essentially it's needed because Let's Encrypt is popular and only does OCSP; so there'll be a big kerfuffle with security signoff if it doesn't get done. So... yes, me :).
Note that the difference in perf is a ~700X.
~75x. But on different machines; and I'm guessing the Linux one is a VM.
Do we have any reason to believe the Linux implementation is that much worse?
For an average cert chain Windows has to do ~5 P/Invoke transitions, on Linux we do ~500, mostly because of trying to piece back together the shape that the API forces us to return data.
My guess is that dotnet/runtime#15113 (which is about status checking an X509 certificate) is probably independent (or maybe actually makes caching harder).
There's an inversion of control needed to bring the feature online, and taking the new needs into consideration along with the perf delta an optimization for the successful chain case will likely result in bringing the OpenSSL version down to ~5 P/Invokes.
So really the comment is "I'm already going to be doing a lot of work in that space, no one should touch it until I'm done".
I'm guessing the Linux one is a VM
It is, in Azure. It's a "Standard D4s v3 (4 vcpus, 16 GB memory)".
Thanks @bartonjs, you put my mind at ease. My main concern was that we were making this work item dependent on a work item which was not clear had an owner. You confirmed you are the owner, understand why the current implementation is slow, and have strong reasons to believe that the work you will do will also fix this. I am happy...
Thanks
Is this still on your radar for .net core 3? I wanted to track the progress for https://github.com/aspnet/AspNetCore/issues/7081 , but can't find the right dashboard metric in powerbi.
Yes, seeing if it can be improved further (some ideas come to mind) is still in scope for 3.0.
Today's investigation says that if LM\Root, LM\CA, CU\Root, CU\CA, and CU\My were all built into permanently cached STACK_OF(X509*) values (SafeX509StackHandle) that tight-loop chains move from ~30ms to ~0.95ms on my test machine (1001 iterations, discounting the first one from the sample set) when the chain is valid and no revocation checks are performed.
Ideally the cache invalidation logic won't add too much back to that.
@sebastienros / @halter73 : FYI, a big perf improvement just went in for X509Chain.Build on Linux. Using one of @stephentoub's SslStream tests we saw about 4 minutes reduction on 10,000 handshakes, ~24ms per. Hopefully your TLS benchmarks agree when you get a build with this change.
If you can share the built assets I can give you a before/after comparison.
FYI current graph comparing Windows to Linux
@sebastienros From your graph, it looks like X509Chain.Build for Linux still has a big performance gap compared with on Windows. Anyway, my project has suffered from this issue, how can I verify the fix in my local environment? shall I try dotnet core nightly build?
I run @stephentoub's above test "X509Chain build" with dotnet 3.0 nightly build 3.0.100-preview6-012026, and see huge improvement.
My question is do you have a plan to backport to dotnet core 2.1, 2.2?
After the fix: 1000 iterations: 00:00:00.0974405 1000 iterations: 00:00:00.0484670 1000 iterations: 00:00:00.0442556 1000 iterations: 00:00:00.0457606 1000 iterations: 00:00:00.0490085
Before the fix: 1000 iterations: 00:00:03.4504450 1000 iterations: 00:00:03.3203837 1000 iterations: 00:00:03.3208841 1000 iterations: 00:00:03.3144381 1000 iterations: 00:00:03.3357802
do you have a plan to backport to dotnet core 2.1, 2.2?
There is no plan to do so.
Thanks for such a huge improvement!
@ccic the nightly builds and docker images should have this change by now
@stephentoub @bartonjs When I use dotnet 3.0 preview on another Ubuntu VM (on Azure), I cannot reproduce the improvement. It takes about 3 seconds for every 1000 iteration. I found it connects to "apps.digsigtrust.com" when the @stephentoub's test "X509Chain build" is running.
But on my local VM which shows huge improvement never connects "apps.digsigtrust.com".
Is there anything wrong for my environment?
Those two VMs are: Ubuntu 16.04.4 LTS dotnet core version: 3.0.100-preview6-012026 OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
@ccic If you're seeing activity on EVERY request, that suggests that either
a) The Azure VM doesn't trust the root, so ends up not caching data. b) The disk is not writable (full, readonly, etc) so caching fails.
The only "intermittent" thing I can envision is OCSP/CRL expiry, if revocation is enabled on your tests. And the only thing I could think that would be different is that the Azure VM ends up hitting a different physical endpoint (due to routing rules) and receives a different response than your faster/successful machine.
@bartonjs I wonder why the caching fails, so I did more tests, and found it may be related to the certificate. Only Chain build for "Let's Encrypt certificate" is very slow. My previous test run"Let's Encrypt certificate" on Azure VM, but run @stephentoub's test embedded certificate on local VM, that is not correct.
I have 2 certificates to check: @stephentoub's test embedded certificate (embedded in above source code), and Let's Encrypt certificate.
When I run Chain build perf test on Let's Encrypt certificate, it takes >3 seconds for 1000 iterations and connects to "apps.digsigtrust.com", but for @stephentoub's test embedded certificate, it takes ~0.05 seconds for 1000 iterations.
Experiments:
Apart from Chain build perf test, I added another function to check the certificate status information.
static void FunctionTest1(X509Certificate2 originalCert)
{
var sw = new Stopwatch();
{
sw.Restart();
using (var chain = new X509Chain())
using (var cert = new X509Certificate2(originalCert))
{
var policy = chain.ChainPolicy;
Console.WriteLine(policy.RevocationFlag);
Console.WriteLine(policy.RevocationMode);
Console.WriteLine(policy.UrlRetrievalTimeout);
Console.WriteLine(policy.VerificationFlags);
Console.WriteLine(policy.VerificationTime);
if (!chain.Build(cert))
{
Console.WriteLine("Certificate is invalid");
}
else
{
Console.WriteLine("Certificate is valid");
}
foreach (var stat in chain.ChainStatus)
{
Console.WriteLine(stat.StatusInformation);
}
if (!cert.Verify())
{
Console.WriteLine("Basic certificate verify failed");
}
else
{
Console.WriteLine("Basic certificate verify successfully");
}
Console.WriteLine(cert.ToString());
}
sw.Stop();
Console.WriteLine(sw.Elapsed);
}
}
I run all the certificates check on Windows 10 machine, Ubuntu 16.04 local VM (openssl 1.1.1), and Azure Ubuntu 18.04 VM (openssl 1.1.1). I found "Let us Encrypt certificate" shows different behavior compared with the another certificate.
@stephentoub's test embedded certificate and my service certificate on Windows 10 are invalid, and the revocation function does not work properly.
ExcludeRoot
Online
00:00:00
NoFlag
5/30/2019 10:32:21 AM
Certificate is invalid
A certificate chain could not be built to a trusted root authority
The revocation function was unable to check revocation for the certificate
The revocation function was unable to check revocation because the revocation server was offline
Basic certificate verify failed
[Subject]
CN=testservereku.contoso.com
[Issuer]
CN=NDX Test Root CA
[Serial Number]
54000000320F036BF59292AA2F000000000032
[Not Before]
12/30/2016 4:43:49 AM
[Not After]
12/30/2036 4:53:49 AM
[Thumbprint]
3D68C9DB575E6FDB56A81AE43B014E129349E024
00:00:00.1004160
Let's Encrypt certificate on windows 10 is valid, and does revocation check.
ExcludeRoot
Online
00:00:00
NoFlag
5/30/2019 10:36:13 AM
Certificate is valid
Basic certificate verify successfully
[Subject]
CN=tlsperf.signalr.pro
[Issuer]
CN=Let's Encrypt Authority X3, O=Let's Encrypt, C=US
[Serial Number]
03D96E3BEEFD35BBF2DEF59B16FD49BA943C
[Not Before]
5/29/2019 10:53:52 AM
[Not After]
8/27/2019 10:53:52 AM
[Thumbprint]
84C91D04BA81AFC678E4AC41C38D8F93F53EEA53
00:00:00.0624163
On Ubuntu, all certificates are invalid and "unable to get certificate CRL" @stephentoub's test embedded certificate on Ubuntu 16.04 and 18.04 has the same output
ExcludeRoot
Online
00:00:00
NoFlag
5/30/2019 2:38:59 AM
Certificate is invalid
unable to get certificate CRL
unable to get local issuer certificate
Basic certificate verify failed
[Subject]
CN=testservereku.contoso.com
[Issuer]
CN=NDX Test Root CA
[Serial Number]
54000000320F036BF59292AA2F000000000032
[Not Before]
12/29/2016 8:43:49 PM
[Not After]
12/29/2036 8:53:49 PM
[Thumbprint]
3D68C9DB575E6FDB56A81AE43B014E129349E024
00:00:00.0997768
Let's Encrypt certificate also has the same output on Ubuntu 16.04 and Ubuntu 18.04, either local VM or Azure VM
ExcludeRoot
Online
00:00:00
NoFlag
5/30/2019 2:37:50 AM
Certificate is invalid
unable to get certificate CRL
unable to get local issuer certificate
Basic certificate verify failed
[Subject]
CN=tlsperf.signalr.pro
[Issuer]
CN=Let's Encrypt Authority X3, O=Let's Encrypt, C=US
[Serial Number]
03D96E3BEEFD35BBF2DEF59B16FD49BA943C
[Not Before]
5/29/2019 2:53:52 AM
[Not After]
8/27/2019 2:53:52 AM
[Thumbprint]
84C91D04BA81AFC678E4AC41C38D8F93F53EEA53
00:00:00.9755910
@ccic If your system isn't considering Let's Encrypt as trusted that'll definitely throw things off.
I can't explain the "apps.digsigtrust.com", since that's not part of the Let's Encrypt infrastructure.
A better printing of the chain is
Console.WriteLine("Chain Element Status:");
foreach (X509ChainElement element in chain.ChainElements)
{
Console.Write(" ");
Console.WriteLine(element.Certificate.Subject);
foreach (X509ChainStatus status in element.ChainElementStatus)
{
Console.WriteLine(" {0} ({1})", status.Status, status.StatusInformation);
}
Console.WriteLine();
}
Console.WriteLine("Chain Summary:");
foreach (X509ChainStatus status in chain.ChainStatus)
{
Console.WriteLine(" {0} ({1})", status.Status, status.StatusInformation);
}
At least the part where it shows the elements. Somewhere you're not getting good chains; but that seems more to be system configuration than anything else.
@bartonjs I checked "apps.digsigtrust.com" whose IP is 192.35.177.64, and it points to https://www.identrust.com/. I found Let's Encrypt used idenTrust to cross sign its certificate. See https://letsencrypt.org/certificates.
I checked the certificate of chain1.pem(https://letsencrypt.org/certs/lets-encrypt-x3-cross-signed.pem.txt) with openssl command: openssl x509 -in chain1.pem -noout -text It looks like during Chain build, it always wants to connect idenTrust to verify CA.
I'm investigating further why this happens.
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
0a:01:41:42:00:00:01:53:85:73:6a:0b:85:ec:a7:08
Signature Algorithm: sha256WithRSAEncryption
Issuer: O = Digital Signature Trust Co., CN = DST Root CA X3
Validity
Not Before: Mar 17 16:40:46 2016 GMT
Not After : Mar 17 16:40:46 2021 GMT
Subject: C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public-Key: (2048 bit)
Modulus:
00:9c:d3:0c:f0:5a:e5:2e:47:b7:72:5d:37:83:b3:
68:63:30:ea:d7:35:26:19:25:e1:bd:be:35:f1:70:
92:2f:b7:b8:4b:41:05:ab:a9:9e:35:08:58:ec:b1:
2a:c4:68:87:0b:a3:e3:75:e4:e6:f3:a7:62:71:ba:
79:81:60:1f:d7:91:9a:9f:f3:d0:78:67:71:c8:69:
0e:95:91:cf:fe:e6:99:e9:60:3c:48:cc:7e:ca:4d:
77:12:24:9d:47:1b:5a:eb:b9:ec:1e:37:00:1c:9c:
ac:7b:a7:05:ea:ce:4a:eb:bd:41:e5:36:98:b9:cb:
fd:6d:3c:96:68:df:23:2a:42:90:0c:86:74:67:c8:
7f:a5:9a:b8:52:61:14:13:3f:65:e9:82:87:cb:db:
fa:0e:56:f6:86:89:f3:85:3f:97:86:af:b0:dc:1a:
ef:6b:0d:95:16:7d:c4:2b:a0:65:b2:99:04:36:75:
80:6b:ac:4a:f3:1b:90:49:78:2f:a2:96:4f:2a:20:
25:29:04:c6:74:c0:d0:31:cd:8f:31:38:95:16:ba:
a8:33:b8:43:f1:b1:1f:c3:30:7f:a2:79:31:13:3d:
2d:36:f8:e3:fc:f2:33:6a:b9:39:31:c5:af:c4:8d:
0d:1d:64:16:33:aa:fa:84:29:b6:d4:0b:c0:d8:7d:
c3:93
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Basic Constraints: critical
CA:TRUE, pathlen:0
X509v3 Key Usage: critical
Digital Signature, Certificate Sign, CRL Sign
Authority Information Access:
OCSP - URI:http://isrg.trustid.ocsp.identrust.com
CA Issuers - URI:http://apps.identrust.com/roots/dstrootcax3.p7c
X509v3 Authority Key Identifier:
keyid:C4:A7:B1:A4:7B:2C:71:FA:DB:E1:4B:90:75:FF:C4:15:60:85:89:10
X509v3 Certificate Policies:
Policy: 2.23.140.1.2.1
Policy: 1.3.6.1.4.1.44947.1.1.1
CPS: http://cps.root-x1.letsencrypt.org
X509v3 CRL Distribution Points:
Full Name:
URI:http://crl.identrust.com/DSTROOTCAX3CRL.crl
X509v3 Subject Key Identifier:
A8:4A:6A:63:04:7D:DD:BA:E6:D1:39:B7:A6:45:65:EF:F3:A8:EC:A1
Signature Algorithm: sha256WithRSAEncryption
dd:33:d7:11:f3:63:58:38:dd:18:15:fb:09:55:be:76:56:b9:
70:48:a5:69:47:27:7b:c2:24:08:92:f1:5a:1f:4a:12:29:37:
24:74:51:1c:62:68:b8:cd:95:70:67:e5:f7:a4:bc:4e:28:51:
cd:9b:e8:ae:87:9d:ea:d8:ba:5a:a1:01:9a:dc:f0:dd:6a:1d:
6a:d8:3e:57:23:9e:a6:1e:04:62:9a:ff:d7:05:ca:b7:1f:3f:
c0:0a:48:bc:94:b0:b6:65:62:e0:c1:54:e5:a3:2a:ad:20:c4:
e9:e6:bb:dc:c8:f6:b5:c3:32:a3:98:cc:77:a8:e6:79:65:07:
2b:cb:28:fe:3a:16:52:81:ce:52:0c:2e:5f:83:e8:d5:06:33:
fb:77:6c:ce:40:ea:32:9e:1f:92:5c:41:c1:74:6c:5b:5d:0a:
5f:33:cc:4d:9f:ac:38:f0:2f:7b:2c:62:9d:d9:a3:91:6f:25:
1b:2f:90:b1:19:46:3d:f6:7e:1b:a6:7a:87:b9:a3:7a:6d:18:
fa:25:a5:91:87:15:e0:f2:16:2f:58:b0:06:2f:2c:68:26:c6:
4b:98:cd:da:9f:0c:f9:7f:90:ed:43:4a:12:44:4e:6f:73:7a:
28:ea:a4:aa:6e:7b:4c:7d:87:dd:e0:c9:02:44:a7:87:af:c3:
34:5b:b4:42
The boxes running this are not directly comparable, however the magnitude of the difference is such that it doesn't really matter.
On my Windows 10 machine, I get numbers like this:
On my Ubuntu 18.04 machine, I get numbers like this:
Repro:
cc: @bartonjs