dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.36k stars 4.75k forks source link

Large number of memfd:doublemapper (deleted) entries #89776

Open ayende opened 1 year ago

ayende commented 1 year ago

Description

When looking into our process, we noticed a large number of entries like this:

7fb3b0895000-7fb3b0896000 rw-s 03ac6000 00:01 2062                       /memfd:doublemapper (deleted)
7fb3b0896000-7fb3b08a0000 ---s 03ac7000 00:01 2062                       /memfd:doublemapper (deleted)
7fb3b08a0000-7fb3b08ab000 rw-s 03ad1000 00:01 2062                       /memfd:doublemapper (deleted)
7fb3b08ab000-7fb3b08b0000 ---s 03adc000 00:01 2062                       /memfd:doublemapper (deleted)
7fb3b08b0000-7fb3b08b9000 rw-s 03ae1000 00:01 2062                       /memfd:doublemapper (deleted)
7fb3b08b9000-7fb3b08c0000 ---s 03aea000 00:01 2062                       /memfd:doublemapper (deleted)
7fb427006000-7fb427007000 rw-s 00000000 00:01 2062                       /memfd:doublemapper (deleted)

The process has been running for about 8 hours, and we have:

sudo cat /proc/10459/maps | grep doublemapper | wc -l
3308

That number is not stable and grows over time, but we are just now loading data into the system, not yet trying to stress it.

Looking at the code, I found the source of that here: https://github.com/dotnet/runtime/blob/bd84336e095a882d2794dd08b64814918a010004/src/coreclr/minipal/Unix/doublemapping.cpp#L63

But looking at where this freed, I see:

https://github.com/dotnet/runtime/blob/bd84336e095a882d2794dd08b64814918a010004/src/coreclr/minipal/Unix/doublemapping.cpp#L99

This looks like this will only actually be freed on MaxOS, and not on Linux?

FWIW, I couldn't find where this is called.

Is this number of entries expected? Should we monitor this value? I understand that this is related to the way the JIT allocate memory?

Related, but we are seeing a large increase in memory usage in some production systems, which is not seen in .NET 6.0 but very noticeable in .NET 7.0

I noticed this: https://github.com/dotnet/runtime/issues/80580

And we are investigating whatever we do a lot of dynamic assembly generation (so far we don't think so, but can't rule it out).

Reproduction Steps

When I started writing this post, I had:

 sudo cat /proc/10459/maps | grep doublemapper | wc -l
3308

By the time I got here, I had:

sudo cat /proc/10459/maps | grep doublemapper | wc -l
3312

So I certainly think that there is something that work here. Note that at this point, the process in question was running for hours, basically in a big loop. So there should be no change in behavior nor would I expect it to run any JIT tiering or some such.

Expected behavior

Not have the runtime allocate indefinitely memory

Actual behavior

We are seeing additional memory mapping over time

Regression?

Yes, we aren't seeing that in .NET 6.0

Known Workarounds

No response

Configuration

No response

Other information

No response

hoyosjs commented 1 year ago

@ayende that's an ifndef - so the doublemapper only deallocates on Linux. It's called from the destructor of ExecutableAllocator, which takes memory that's normally RX and creates an RW mapping as needed for W^X purposes. Lambdas that aren't cached and reflection can also cause such behavior too. cc: @janvorli

ayende commented 1 year ago

I'm sorry, didn't realize that this was ifndef, read that as ifdef. What do you mean by "uncached lambda"?

Is there an expectation that this will grow without limit? What sort of reflection would cause this?

hoyosjs commented 1 year ago

Actually, even in the lambda case of capturing context, I expect allocations of a managed object, but not jitting of new objects. Essentially, I expect tiering, loading, and some debugger operations to cause RX -> RW paging. You can use dotnet counters to see if jitting method count increases. I am not sure what's causing the growth in this case. @janvorli, does this count against the real memory usage? Any ideas what might be contributing to this? I thought since the mapping is deleted it becomes free for the process. I do expect it to count against the max_map_count though.

janvorli commented 1 year ago

The shared memory that is visible as /memfd:doublemapper is used for allocating all executable code for JIT and also for runtime generated helpers and data that need to be allocated close to code that references them. This is the base of the W^X feature that ensures that no memory in the process is writeable and executable at the same time. We double map executable code blocks are writeable memory temporarily to write or modify the code.

ayende commented 1 year ago

Hi, For reference, we run the workload with DOTNET_EnableWriteXorExecute=0 and we are seeing 10561 entries in the /proc/PID/maps

It is going up & down a bit but appears to be mostly stable.

Without this flag, we are seeing a lot more mapping, and they are always increasing.

My expectation that with W^X, we'll stop needing those once the system stabilized, but we saw overall increase over time even after hours of running.

marcovr commented 1 year ago

We are currently facing a similar issue, where we found runtime crashes like this:

Fatal error. The RW block to unmap was not found
Repeat 2 times:
--------------------------------
   at System.Runtime.CompilerServices.RuntimeHelpers.CompileMethod(System.RuntimeMethodHandleInternal)
--------------------------------
   at System.Reflection.Emit.DynamicMethod.CreateDelegate(System.Type, System.Object)
   at System.Linq.Expressions.Compiler.LambdaCompiler.Compile(System.Linq.Expressions.LambdaExpression)
   ...

This seems to be the case because we run into the default maximum number of memory maps

$ sysctl vm.max_map_count
vm.max_map_count = 65530

I wanted to find out why this is happening and by examining /proc/1/maps I can see that ~90% of all memorymaps are coming from doublemapper

$  cat /proc/1/maps | grep doublemapper | wc -l
59588
$  cat /proc/1/maps | wc -l
63878
ayende commented 1 year ago

We also hit the limit on the # of maps recently in production.

I'm not sure how to point a finger, but it absolutely feels like there is a leak here.

ayende commented 1 year ago

We looked in more detail on the /proc/self/maps and we see:

`` /memfd:doublemapper 7,462 times Unknown 34,901 times



Those unknown are looking roughly like: `7f9a8bb2e000-7f9b08000000 ---p 00000000 00:00 0`

We run roughly the same process on Windows as well, and looked at the VM Map results.

We have 268GB (!) of `Thread Execution Block` ? 
For that matter you can see that there is a TEB here that is 90MB in size, which seems.. really high

![image](https://github.com/dotnet/runtime/assets/116915/467c944c-3310-476c-8e97-16329632e1d3)

Here is the vmmap data:

[Raven-VMMap.zip](https://github.com/dotnet/runtime/files/12301035/Raven-VMMap.zip)

Any ideas what we are looking at here?
janvorli commented 1 year ago

@ayende I wonder if it would be possible to run your app with and without W^X enabled for about the same time and then share the /proc/{PID}/smaps (smaps have more details than maps) for each of the cases. I'd like to take a look at the mappings to see how they differ between those two cases, as you've mentioned that the number of mappings looked stable with W^X disabled.

ayende commented 1 year ago

AWS: c5.xlarge

$  lsb_release -d
Description:    Ubuntu 22.04.2 LTS

$ uname -a
Linux ip-172-31-16-178 5.15.0-1031-aws #35-Ubuntu SMP Fri Feb 10 02:07:18 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ wget https://daily-builds.s3.amazonaws.com/RavenDB-5.4.109-linux-x64.tar.bz2

$ sudo apt install bzip

$ tar -xf RavenDB-5.4.109-linux-x64.tar.bz2

$ ./RavenDB/run.sh

$ cat <<EOF > RavenDB/Server/settings.json
{
    "ServerUrl": "http://127.0.0.1:8080",
    "Setup.Mode": "None",
    "DataDir": "RavenData",
    "License.Eula.Accepted": true
}
EOF

$ ./RavenDB/run.sh

In the browser:

$ $ curl "http://localhost:8080/databases/test/admin/smuggler/import?url=https://twitter-2020-rvn-dump.s3.us-west-1.amazonaws.com/2023-03-29-07-46-59.ravendb-full-backup"

** Note: That is a very large file.

What happens under the covers is that we import a lot of data into RavenDB. There should be no assembly generation in this process, and pretty much all the code that is involved is basically the same big loop.

$ cat /proc/$(pidof Raven.Server)/smaps | grep memfd | wc -l

We see a very rapid growth to ~2,600 memfd items Then slow growth over time, this takes ~few minutes or so but adds another few over time.

[09:07:10] ubuntu@ip-172-31-16-178:~$ cat /proc/$(pidof Raven.Server)/maps | grep memfd | wc -l
2652
[09:07:11] ubuntu@ip-172-31-16-178:~$ cat /proc/$(pidof Raven.Server)/maps | grep memfd | wc -l
2654
[09:07:15] ubuntu@ip-172-31-16-178:~$ cat /proc/$(pidof Raven.Server)/maps | grep memfd | wc -l
2654
[09:07:28] ubuntu@ip-172-31-16-178:~$ cat /proc/$(pidof Raven.Server)/maps | grep memfd | wc -l
2659
[09:07:40] ubuntu@ip-172-31-16-178:~$ cat /proc/$(pidof Raven.Server)/maps | grep memfd | wc -l
2669

Here is all the maps here: cat /proc/$(pidof Raven.Server)/maps |wc -l 3958

Output from smaps:

7fed43c26000-7fed43c27000 r-xs 00167000 00:01 3072                       /memfd:doublemapper (deleted)
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
ProtectionKey:         0
VmFlags: rd ex sh mr mw me ms sd
7fed43c27000-7fed43c28000 rw-s 00168000 00:01 3072                       /memfd:doublemapper (deleted)
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
ProtectionKey:         0
VmFlags: rd wr sh mr mw me ms sd

I tried running: $ sudo strace -kfp $(pidof Raven.Server) -e trace=memfd_create

Gave this output:

ubuntu@ip-172-31-16-178:~$ sudo strace -kfp $(pidof Raven.Server) -e trace=memfd_create
strace: Process 2495 attached with 39 threads
strace: Process 2856 attached
[pid  2527] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=2495, si_uid=1000} ---
 > /memfd:doublemapper (deleted)() [0x2861ac6]
 > /memfd:doublemapper (deleted)() [0x27e4535]
[pid  2527] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=2495, si_uid=1000} ---
 > /memfd:doublemapper (deleted)() [0x2861abb]
 > /memfd:doublemapper (deleted)() [0x27e4535]
[pid  2527] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=2495, si_uid=1000} ---
 > /memfd:doublemapper (deleted)() [0x2861d6b]
 > /memfd:doublemapper (deleted)() [0x27e4535]
[pid  2527] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=2495, si_uid=1000} ---
 > /memfd:doublemapper (deleted)() [0x2861abb]
 > /memfd:doublemapper (deleted)() [0x27e4535]
[pid  2856] +++ exited with 0 +++
[pid  2640] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=2495, si_uid=1000} ---
 > /memfd:doublemapper (deleted)() [0x27ad3c5]
 > /memfd:doublemapper (deleted)() [0x27e66df]
 > /memfd:doublemapper (deleted)() [0x27ee81a]
 > /memfd:doublemapper (deleted)() [0x27f3b57]
 > /memfd:doublemapper (deleted)() [0x27f3aa4]
 > /memfd:doublemapper (deleted)() [0x27ee24e]
 > /memfd:doublemapper (deleted)() [0x27f3947]
 > /memfd:doublemapper (deleted)() [0x27f389e]
 > /memfd:doublemapper (deleted)() [0x27ecbc4]
 > /memfd:doublemapper (deleted)() [0x27f2849]
 > /memfd:doublemapper (deleted)() [0x27f2691]
 > /memfd:doublemapper (deleted)() [0x2833af0]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x28331a6]
 > /memfd:doublemapper (deleted)() [0x27edb9c]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x2832f5d]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27ee36f]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x2832d2e]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27e8d06]
 > /memfd:doublemapper (deleted)() [0x2741f2d]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x2832b0e]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27fe57b]
 > /memfd:doublemapper (deleted)() [0x27fe4a6]
 > /memfd:doublemapper (deleted)() [0x28018a8]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x28328fe]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27fb6a5]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x28326de]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27fe57b]
 > /memfd:doublemapper (deleted)() [0x27fe4a6]
 > /memfd:doublemapper (deleted)() [0x281863f]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x283249e]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x2817aed]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x283226e]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27fe57b]
 > /memfd:doublemapper (deleted)() [0x27fe4a6]
 > /memfd:doublemapper (deleted)() [0x2817487]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x2831d1e]
 > /memfd:doublemapper (deleted)() [0x28135c2]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x27fbd3c]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x2814b19]
 > /memfd:doublemapper (deleted)() [0x2754c86]
 > /memfd:doublemapper (deleted)() [0x2727c5c]
 > /home/ubuntu/RavenDB/Server/System.Private.CoreLib.dll() [0x228532]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x2bf057) [0x4c4a57]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xee07e) [0x2f3a7e]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x106182) [0x30bb82]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xb699a) [0x2bc39a]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xb6f9d) [0x2bc99d]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x106257) [0x30bc57]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x44c1ce) [0x651bce]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_condattr_setpshared+0x513) [0x94b43]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__xmknodat+0x230) [0x126a00]
[pid  2527] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=2495, si_uid=1000} ---
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__nptl_death_event+0x187) [0x91197]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x211) [0x93ac1]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x43f9db) [0x6453db]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x43f691) [0x645091]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x444072) [0x649a72]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x4442a9) [0x649ca9]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x1d273a) [0x3d813a]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x1d6fe5) [0x3dc9e5]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x15f287) [0x364c87]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x2c00bd) [0x4c5abd]
 > /memfd:doublemapper (deleted)() [0x280fa88]
 > /memfd:doublemapper (deleted)() [0x280f514]
 > /memfd:doublemapper (deleted)() [0x27f6b78]
 > /memfd:doublemapper (deleted)() [0x280cec7]
 > /memfd:doublemapper (deleted)() [0x2845d08]
 > /memfd:doublemapper (deleted)() [0x2843a70]
 > /memfd:doublemapper (deleted)() [0x27b77d8]
 > /memfd:doublemapper (deleted)() [0x22d9c97]
 > /memfd:doublemapper (deleted)() [0x22d992c]
 > /memfd:doublemapper (deleted)() [0x1e676d7]
 > /memfd:doublemapper (deleted)() [0x1e65733]
 > /home/ubuntu/RavenDB/Server/System.Private.CoreLib.dll() [0x216dfb]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x2bf057) [0x4c4a57]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xee07e) [0x2f3a7e]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x106182) [0x30bb82]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xb699a) [0x2bc39a]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xb6f9d) [0x2bc99d]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x106257) [0x30bc57]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x44c1ce) [0x651bce]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_condattr_setpshared+0x513) [0x94b43]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__xmknodat+0x230) [0x126a00]
[pid  2640] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=2495, si_uid=1000} ---
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x41d08d) [0x622a8d]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x42cdc3) [0x6327c3]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x43b0fa) [0x640afa]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x1d73e6) [0x3dcde6]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x1356b9) [0x33b0b9]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x224766) [0x42a166]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x23cce5) [0x4426e5]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x26ec31) [0x474631]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x13ae0f) [0x34080f]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x1399eb) [0x33f3eb]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x1561b4) [0x35bbb4]
 > /memfd:doublemapper (deleted)() [0x27409d7]
 > /memfd:doublemapper (deleted)() [0x27ebcc7]
 > /memfd:doublemapper (deleted)() [0x27eaa2f]
 > /memfd:doublemapper (deleted)() [0x27ea354]
 > /memfd:doublemapper (deleted)() [0x27e531e]
 > /memfd:doublemapper (deleted)() [0x27ee81a]
 > /memfd:doublemapper (deleted)() [0x27f3b57]
 > /memfd:doublemapper (deleted)() [0x27f3aa4]
 > /memfd:doublemapper (deleted)() [0x27ee24e]
 > /memfd:doublemapper (deleted)() [0x27f3947]
 > /memfd:doublemapper (deleted)() [0x27f389e]
 > /memfd:doublemapper (deleted)() [0x27ecbc4]
 > /memfd:doublemapper (deleted)() [0x27f2849]
 > /memfd:doublemapper (deleted)() [0x27f2691]
 > /memfd:doublemapper (deleted)() [0x2833af0]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x28331a6]
 > /memfd:doublemapper (deleted)() [0x27edb9c]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x2832f5d]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27ee36f]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x2832d2e]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27e8d06]
 > /memfd:doublemapper (deleted)() [0x2741f2d]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x2832b0e]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27fe57b]
 > /memfd:doublemapper (deleted)() [0x27fe4a6]
 > /memfd:doublemapper (deleted)() [0x28018a8]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x28328fe]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27fb6a5]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x28326de]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27fe57b]
 > /memfd:doublemapper (deleted)() [0x27fe4a6]
 > /memfd:doublemapper (deleted)() [0x281863f]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x283249e]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x2817aed]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x283226e]
 > /memfd:doublemapper (deleted)() [0x276492b]
 > /memfd:doublemapper (deleted)() [0x2724853]
 > /memfd:doublemapper (deleted)() [0x27fe57b]
 > /memfd:doublemapper (deleted)() [0x27fe4a6]
 > /memfd:doublemapper (deleted)() [0x2817487]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x2831d1e]
 > /memfd:doublemapper (deleted)() [0x28135c2]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x27fbd3c]
 > /memfd:doublemapper (deleted)() [0x2724dd9]
 > /memfd:doublemapper (deleted)() [0x2814b19]
 > /memfd:doublemapper (deleted)() [0x2754c86]
 > /memfd:doublemapper (deleted)() [0x2727c5c]
 > /home/ubuntu/RavenDB/Server/System.Private.CoreLib.dll() [0x228532]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x2bf057) [0x4c4a57]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xee07e) [0x2f3a7e]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x106182) [0x30bb82]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xb699a) [0x2bc39a]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xb6f9d) [0x2bc99d]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x106257) [0x30bc57]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x44c1ce) [0x651bce]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_condattr_setpshared+0x513) [0x94b43]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__xmknodat+0x230) [0x126a00]
strace: Process 2857 attached
[pid  2527] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=2495, si_uid=1000} ---
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__nss_database_lookup+0x3784a) [0x1afbba]
 > /memfd:doublemapper (deleted)() [0x2720b31]
 > /memfd:doublemapper (deleted)() [0x2850c63]
 > /memfd:doublemapper (deleted)() [0x2853650]
 > /memfd:doublemapper (deleted)() [0x2852fa1]
 > /memfd:doublemapper (deleted)() [0x27f6bcf]
 > /memfd:doublemapper (deleted)() [0x280cec7]
 > /memfd:doublemapper (deleted)() [0x2845d08]
 > /memfd:doublemapper (deleted)() [0x2843a70]
 > /memfd:doublemapper (deleted)() [0x27b77d8]
 > /memfd:doublemapper (deleted)() [0x22d9c97]
 > /memfd:doublemapper (deleted)() [0x22d992c]
 > /memfd:doublemapper (deleted)() [0x1e676d7]
 > /memfd:doublemapper (deleted)() [0x1e65733]
 > /home/ubuntu/RavenDB/Server/System.Private.CoreLib.dll() [0x216dfb]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x2bf057) [0x4c4a57]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xee07e) [0x2f3a7e]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x106182) [0x30bb82]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xb699a) [0x2bc39a]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0xb6f9d) [0x2bc99d]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x106257) [0x30bc57]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so(GetCLRRuntimeHost+0x44c1ce) [0x651bce]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_condattr_setpshared+0x513) [0x94b43]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(__xmknodat+0x230) [0x126a00]

I then killed the RavenDB process:

$ export DOTNET_EnableWriteXorExecute=0
$ ./RavenDB/run.sh

I deleted and re-created the Test database and then:

$ curl "http://localhost:8080/databases/test/admin/smuggler/import?url=https://twitter-2020-rvn-dump.s3.us-west-1.amazonaws.com/2023-03-29-07-46-59.ravendb-full-backup"

Obviously, there are no memfd items in the maps there, but I tried:

[09:19:47] ubuntu@ip-172-31-16-178:~$ cat /proc/$(pidof Raven.Server)/maps |  wc -l
4123
[09:19:56] ubuntu@ip-172-31-16-178:~$ cat /proc/$(pidof Raven.Server)/maps |  wc -l
4129

I'm adding the full maps from two times, so you can see this over time (without W^X). maps-no-w^x.zip

And here is the smaps output for both modes:

smaps.zip

koepalex commented 1 year ago

I'm currently try to understand why in our application, the process memory (full memory dump) is way bigger, than what we "use" (details see: https://stackoverflow.com/questions/77023695/missmatch-between-expected-memory-size-of-an-dotnet-application-and-real-consume )

So today I checked /proc<pid>/maps and saw that 2452 entries out of 3265 contain /memfd:doublemapper(deleted) @ayende do you also a missmatch in the expected memory size (Heaps, Stacks, Modules) and the consumed process memory?

ayende commented 1 year ago

Yes, we are also seeing some weirdness around that.

ayende commented 1 year ago

I just tested this on .NET 8.0 RC1, we are seeing (still) an increase in the number of memfd files over time.

I can reproduce this quite easily:

I'm getting some results from this:

 strace -f --instruction-pointer --stack-traces -e memfd_create  RavenDB/Server/Raven.Server
[pid  7658] [00007f00a0346c81] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=7565, si_uid=1000} ---
 > /memfd:doublemapper (deleted)() [0x2d67c81]
 > /memfd:doublemapper (deleted)() [0x2da928e]
 > /memfd:doublemapper (deleted)() [0x68ff791]
 > /memfd:doublemapper (deleted)() [0x68ffc2b]
 > /memfd:doublemapper (deleted)() [0x68ff48e]
 > /memfd:doublemapper (deleted)() [0x68c1d16]
 > /memfd:doublemapper (deleted)() [0x68ee05c]
 > /memfd:doublemapper (deleted)() [0x68dd66c]
 > /memfd:doublemapper (deleted)() [0x6821b7f]
 > /memfd:doublemapper (deleted)() [0x636d5a2]
 > /memfd:doublemapper (deleted)() [0x223ef8d]
 > /memfd:doublemapper (deleted)() [0x223c6d9]
 > /memfd:doublemapper (deleted)() [0x2d925b4]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x49b7c7]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x2d5df6]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x2ebab2]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x2a4e05]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x2a53bd]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x2ebb88]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x612a2e]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_condattr_setpshared+0x513) [0x94b43]
 > unexpected_backtracing_error [0x135e]
[pid  7579] [????????????????] +++ exited with 0 +++
[pid  7658] [00007f00a0389e27] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TKILL, si_pid=7565, si_uid=1000} ---
 > /memfd:doublemapper (deleted)() [0x2daae27]
 > /memfd:doublemapper (deleted)() [0x68ffe9c]
 > /memfd:doublemapper (deleted)() [0x68ff48e]
 > /memfd:doublemapper (deleted)() [0x68c1d16]
 > /memfd:doublemapper (deleted)() [0x68ee05c]
 > /memfd:doublemapper (deleted)() [0x68dd66c]
 > /memfd:doublemapper (deleted)() [0x6821b7f]
 > /memfd:doublemapper (deleted)() [0x636d5a2]
 > /memfd:doublemapper (deleted)() [0x223ef8d]
 > /memfd:doublemapper (deleted)() [0x223c6d9]
 > /memfd:doublemapper (deleted)() [0x2d925b4]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x49b7c7]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x2d5df6]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x2ebab2]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x2a4e05]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x2a53bd]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x2ebb88]
 > /home/ubuntu/RavenDB/Server/libcoreclr.so() [0x612a2e]
 > /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_condattr_setpshared+0x513) [0x94b43]
 > unexpected_backtracing_error [0x7ebf2e4eef40]

I can't get symbols from strace, and I can't get lldb (where I do get symbols) to stop on the right location.

Running with: b VMToOSInterface::CreateDoubleMemoryMapper gives the right output, but doesn't actually stop.

hoyosjs commented 1 year ago

@ayende dotnet symbol should be able to get them for you with the --symbols flag if it's the Microsoft-built runtime: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-symbol

ayende commented 1 year ago

Yes, I tried running that, it didn't seem to matter in terms of strace, it did work with lldb, I think, but couldn't get the breakpoint to hit.

janvorli commented 1 year ago

it didn't seem to matter in terms of strace

Interestingly, I've seen the symbols both working and not working with strace on the same Ubuntu 22.04 (except that one was in WSL2 - that didn't work, and the other in a docker container - which worked). I am currently trying to figure out what makes it different, since I need to get it working for an investigation I am doing.

theolivenbaum commented 11 months ago

@janvorli seeing something similar on our application, over >16k /memfd:doublemapper (deleted) after a few days of uptime. Tested on .NET8, but probably the same on .NET7 was we had strange problems with OOM in the past.

We do generate assemblies at runtime using Microsoft.CodeAnalysis.Scripting.Script, is this known for leaking memory like in #80580?

Possibly related: https://github.com/dotnet/roslyn/issues/52217 and https://github.com/dotnet/roslyn/issues/41722

janvorli commented 11 months ago

@theolivenbaum this is not a leak, the number of allocations of regions marked with /memfd:doublemapper are expected to grow when runtime compiles more and more code. If this code is not inside of a collectible AssemblyLoadContext and thus it is not unloadable, this stuff is not freed either.

theolivenbaum commented 11 months ago

@theolivenbaum this is not a leak, the number of allocations of regions marked with /memfd:doublemapper are expected to grow when runtime compiles more and more code. If this code is not inside of a collectible AssemblyLoadContext and thus it is not unloadable, this stuff is not freed either.

Thanks! I'm changing our code to use AssemblyLoadContext, I'll check again in a week to see how it behaves