dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.97k stars 4.66k forks source link

High memory consumption using X509Certificate2 GetNameInfo((X509NameType.UpnName, false) #99933

Closed jvinokurovs closed 5 months ago

jvinokurovs commented 6 months ago

Description

I'm having an issue with high RES memory usage for dotnet process doing GetNameInfo((X509NameType.UpnName, false). Reproducing issue just by running same command on the same certificate in cycle I get:

  1. The counters for the process showing high working set for the process (1m iteration and just growing increasing number of iterations):
    Name                                                               Current Value
    [System.Runtime]
    % Time in GC since last GC (%)                                         0
    Allocation Rate (B / 1 sec)                                        8,168
    CPU Usage (%)                                                          0.026
    Exception Count (Count / 1 sec)                                        0
    GC Committed Bytes (MB)                                               21.185
    GC Fragmentation (%)                                                   0.886
    GC Heap Size (MB)                                                      7.898
    Gen 0 GC Count (Count / 1 sec)                                         0
    Gen 0 Size (B)                                                       528
    Gen 1 GC Count (Count / 1 sec)                                         0
    Gen 1 Size (B)                                                   122,008
    Gen 2 GC Count (Count / 1 sec)                                         0
    Gen 2 Size (B)                                                         0
    IL Bytes Jitted (B)                                               41,344
    LOH Size (B)                                                           0
    Monitor Lock Contention Count (Count / 1 sec)                          0
    Number of Active Timers                                                0
    Number of Assemblies Loaded                                           13
    Number of Methods Jitted                                             328
    POH (Pinned Object Heap) Size (B)                                 39,952
    ThreadPool Completed Work Item Count (Count / 1 sec)                   0
    ThreadPool Queue Length                                                0
    ThreadPool Thread Count                                                0
    Time spent in JIT (ms / 1 sec)                                         0
    Working Set (MB)                                                     255.107

    valgrind --leak-check=full ./test (with 1m iterations ):

==21032== 15,337 bytes in 313 blocks are possibly lost in loss record 2,582 of 2,604 ==21032== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==21032== by 0x21239CE9: ASN1_STRING_set (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21241C6F: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x2124271C: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242C05: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242E15: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242517: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242B95: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242EA8: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x2124214B: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242AB2: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242EA8: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== ==21032== 120,969,113 (15,998,928 direct, 104,970,185 indirect) bytes in 999,933 blocks are definitely lost in loss record 2,604 of 2,604 ==21032== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==21032== by 0x21322C0D: CRYPTO_zalloc (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21244BC8: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x212423FA: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242AB2: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242EA8: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242265: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x21242FB1: ASN1_item_ex_d2i (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x2124303E: ASN1_item_d2i (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x213B3614: X509V3_EXT_d2i (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1) ==21032== by 0x210EF556: CryptoNative_GetX509NameInfo (in /usr/share/dotnet/shared/Microsoft.NETCore.App/7.0.17/libSystem.Security.Cryptography.Native.OpenSsl.so) ==21032== by 0x5F1E0A47: ??? (in /usr/share/dotnet/shared/Microsoft.NETCore.App/7.0.17/System.Security.Cryptography.dll) ==21032== ==21032== LEAK SUMMARY: ==21032== definitely lost: 16,008,816 bytes in 999,969 blocks ==21032== indirectly lost: 104,970,185 bytes in 3,999,052 blocks ==21032== possibly lost: 32,849 bytes in 805 blocks ==21032== still reachable: 2,012,263 bytes in 7,944 blocks ==21032== suppressed: 0 bytes in 0 blocks

Memory is not release after disposing certificate, only on process exit.

Reproduction Steps

for (long i =0; i < 1000000; i++) {
var result = cert.GetNameInfo(X509NameType.UpnName, false); }

Expected behavior

Constant memory usage

Actual behavior

Growing RES memory for the dotnet process.

Regression?

No response

Known Workarounds

No response

Configuration

.NET SDKs installed: 7.0.407 [/usr/share/dotnet/sdk] (checked with 8.0.202 - same result) ubuntu Ubuntu 20.04.6 LTS (checked on official .NET image - same result) openssl ii openssl 1.1.1f-1ubuntu2.22 amd64

Also checked on Windows - no issue observed.

Other information

No response

vcsjones commented 6 months ago

/cc @bartonjs @dotnet/area-system-security

janvorli commented 5 months ago

@jvinokurovs I wonder - if you force GC say every 1000 iterations by calling GC.Collect(), does the memory keep growing? It is possible that the memory consumption doesn't reach a level when the GC would feel a memory pressure and kick in.

jvinokurovs commented 5 months ago

It is growing all the way until container is killed with out of memory error (in last test somewhere around 8 Gb with all k8s limits set properly). I haven't tried calling GC.Collect directly, but managed memory seem to be not used really.

@jvinokurovs I wonder - if you force GC say every 1000 iterations by calling GC.Collect(), does the memory keep growing? It is possible that the memory consumption doesn't reach a level when the GC would feel a memory pressure and kick in.

vcsjones commented 5 months ago

@jvinokurovs Do you have an example certificate that reproduces this behavior?

janvorli commented 5 months ago

I haven't tried calling GC.Collect directly, but managed memory seem to be not used really.

My point was that there can be tiny GC heap objects that hold a lot of native memory. So even if there are many of those, the GC heap size would not change in a noticeable way.

vcsjones commented 5 months ago

Something does seem amiss here. I can reproduce this with .NET 9 and OpenSSL 3.0 using the following code

using System;
using System.Security.Cryptography;
using System.Security.Cryptography.X509Certificates;

SubjectAlternativeNameBuilder san = new();
san.AddUserPrincipalName("vcsjones@github.com");
san.AddDnsName("github.com");

using ECDsa ec = ECDsa.Create(ECCurve.NamedCurves.nistP256);
CertificateRequest req = new("CN=Kevin Jones", ec, HashAlgorithmName.SHA256);
req.CertificateExtensions.Add(san.Build());
using X509Certificate2 cert = req.CreateSelfSigned(DateTime.Now, DateTime.Now.AddYears(1));

for (int i = 1; ; i = (i + 1) % 10000)
{
    cert.GetNameInfo(X509NameType.UpnName, false);

    if (i == 0)
    {
        GC.Collect();
    }
}

That eventually used 100% of my system memory (32 GB), started swapping, and needed to be killed.

I don't think this is unique to X509NameType.UpnName. I was able to reproduce this with X509NameType.DnsName as well.

It does not reproduce with X509NameType.SimpleName. That stays at a low 100-ish MB of memory. So something seems to be off here:

https://github.com/dotnet/runtime/blob/82a0456a110d5db1e4e306df32bf2a8ce856d7d1/src/native/libs/System.Security.Cryptography.Native/openssl.c#L607

janvorli commented 5 months ago

Just a quick thought, maybe the issue is here: https://github.com/dotnet/runtime/blob/54c5b47acc0003fa82efece80fb784b6da1594f9/src/native/libs/System.Security.Cryptography.Native/openssl.c#L693

Shouldn't it use sk_GENERAL_NAMES_free instead?

vcsjones commented 5 months ago

I think we should be doing what we are doing here

https://github.com/dotnet/runtime/blob/82a0456a110d5db1e4e306df32bf2a8ce856d7d1/src/native/libs/System.Security.Cryptography.Native/apibridge.c#L703

  1. Use GENERAL_NAMES* instead of STACK_OF(GENERAL_NAME)
  2. Use GENERAL_NAMES_free, not sk_GENERAL_NAME_free. The former says:

    TYPE_free() releases the object and all pointers and sub-objects within it.

    Whereas the latter says

    sk_TYPE_free() frees up the sk structure. It does not free up any elements of sk. After this call sk is no longer valid

It's a little annoying because X509_get_ext_d2i does not clearly document the ownership of the returned items. However, OpenSSL does GENERAL_NAMES_free itself e.g. https://github.com/openssl/openssl/blob/4a3e8f08306c64366318e26162ae0a0eb7b1a006/crypto/x509/x509_vfy.c#L686

janvorli commented 5 months ago

cc: @karelz

vcsjones commented 5 months ago

Re-opening for back ports.

jvinokurovs commented 5 months ago

Sorry for me not answering timely on requests for more information in the thread, but looks like it is all being figured out, thank you

bartonjs commented 5 months ago

The backports have been merged, no further action needed.