dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.28k stars 4.73k forks source link

Excessive VM memory consumption on Linux #7740

Open ipinak opened 7 years ago

ipinak commented 7 years ago

I have an ASP.NET app running on Docker (image: microsoft/dotnet:1.1.1-sdk). When we start the application it allocates ~11G of virtual memory. I have run the application directly on the host, i.e. without Docker, both on Windows and Ubuntu. I have noticed that in Windows is consumes 30MB of Memory (private working set) and about the same for Memory (shared working set), but on Ubuntu it still consumes 11G.

So, I have the following questions:

  1. What's the cause for this?
  2. Do we need to do something to avoid this?

Here are some data from running ps aux from Ubuntu.

USER      PID    %CPU   %MEM     VSZ   RSS   TTY      STAT START   TIME COMMAND
root      8832   0.4    1.3   11896404 66396 ?        SLl  11:41   0:01 dotnet ...
janvorli commented 7 years ago

@ipinak Please note that this is just a reservation of virtual address space, not a physical memory. Since each process (on x64) has 128TB of virtual address space for its private use, this should not be a problem. This virtual memory is mostly an address space reserved for GC heaps and executable code.

mikedn commented 7 years ago

Also note that on Windows "working set" means something like the amount of committed memory that the OS thinks the application needs. It's not all the memory that the application has committed and it's certainly not reserved virtual address space which is very large on Windows too. For example, I have a running .NET application where VMMap shows 17GB of reserved virtual address space, 400MB of commited memory and a 185MB working set.

ipinak commented 7 years ago

Thank you both @janvorli and @mikedn for the explanation. Is there any way to reduce that? Since I'm running on Docker, I'm thinking of using cgroups, do you have any opinion on that?

janvorli commented 7 years ago

@ipinak why would you want to reduce that? Are you hitting some problem related to that?

ipinak commented 7 years ago

I haven't faced any issue, it's for provisioning planning reasons.

6opuc commented 5 years ago

@janvorli The real problem related to huge vm usage is that the coredumps created from dotnet processes are too huge too. Real example: I'm running console app which uses less than 100MB of memory, but virtual memory usage is always near 10GB. When I create coredump of this app under Linux, the size of this coredump is 10GB too. When I create process dump of the same app under Windows, the size of this dump is near 100MB.

janvorli commented 5 years ago

I wonder what is your setting in /proc/self/coredump_filter.

6opuc commented 5 years ago

cat /proc/self/coredump_filter 00000033

janvorli commented 5 years ago

Ok, that's the default. Hmm, I wonder if we can do anything with the size of the core dump so that it doesn't include pages that were just reserved (allocated by mmap with PROT_NONE option). I can see that madvise can be used with MADV_DONTDUMP to prevent a range of memory from being included in the core dump, so maybe it could be used in some way.

6opuc commented 5 years ago

How can I help to implement/test this modification? Maybe some directions in source code so I can test it on my side?

janvorli commented 5 years ago

@6opuc that would be awesome! I think that we could add call to madvise MADV_DONTDUMP in this function for the range we've reserved: https://github.com/dotnet/coreclr/blob/6958ede8e835048b9d1ee9843d7587cacf527101/src/pal/src/map/virtual.cpp#L958 Then add call to madvise with MADV_DODUMP for the commited range in https://github.com/dotnet/coreclr/blob/6958ede8e835048b9d1ee9843d7587cacf527101/src/pal/src/map/virtual.cpp#L1041 And it seems we would need to add madvise with MADV_DONTDUMP / MADV_DODUMP (based on the protection) in https://github.com/dotnet/coreclr/blob/6958ede8e835048b9d1ee9843d7587cacf527101/src/pal/src/map/virtual.cpp#L1621 I think that should be sufficient.

To test your changes, the simplest way that I often use is to publish your testing application for linux-x64 rid (dotnet publish -r linux-x64). After the publishing, it shows you the path to the publish folder that contains all the runtime files. Then I get the commit number from which the libcoreclr.so in that folder was built (strings libcoreclr.so | grep "@(#)"). Then I check out the coreclr repo at that commit, make my changes there, build it (just run the build.sh script in the root, that will get you the debug build with symbols and no optimizations so that you can easily debug your change). After the build completes, it will tell you where it put the resulting binaries (bin/Product/Linux.x64.Debug). Just copy all the files from there into the publish folder, overwriting the existing ones. Then you can debug and test your changes.

6opuc commented 5 years ago

Thanks a lot! I'll try that today and let you know about the results

6opuc commented 5 years ago

I've made the changes you described to test if core dump will be smaller, but size of core dump is the same :( I'm not sure if I understand you correctly. This is what I did:

diff --git a/src/pal/src/map/virtual.cpp b/src/pal/src/map/virtual.cpp
index a5610ef..55c210d 100644
--- a/src/pal/src/map/virtual.cpp
+++ b/src/pal/src/map/virtual.cpp
@@ -1016,6 +1016,20 @@ static LPVOID ReserveVirtualMemory(
         return nullptr;
     }

+       // Do not include reserved virtual memory in core dumps
+       // TODO: only on linux, because MADV_DONTDUMP is Linux-specific
+       int madviseResult = madvise(pRetVal,
+                                        MemSize,
+                                        MADV_DONTDUMP);
+       if (madviseResult != 0)
+       {
+               //TODO: should we fail and exit here?
+               ERROR("madvise failed!\n");
+               pthrCurrent->SetLastError(ERROR_INVALID_ADDRESS);
+               munmap(pRetVal, MemSize);
+               return nullptr;
+       }
+
 #if MMAP_ANON_IGNORES_PROTECTION
     if (mprotect(pRetVal, MemSize, PROT_NONE) != 0)
     {
@@ -1169,6 +1183,15 @@ VIRTUALCommitMemory(
                 goto error;
             }

+                       // Include committed memory in core dump
+                       // TODO: Linux-specific?
+                       if (madvise((void *) StartBoundary, MemSize, MADV_DODUMP) != 0)
+                       {
+                               //TODO: is it safe to continue here?
+                               ERROR("madvise() failed! Error(%d)=%s\n", errno, strerror(errno));
+                               goto error;
+                       }
+
             VIRTUALSetAllocState(MEM_COMMIT, runStart, runLength, pInformation);

             if (nProtect == (PROT_WRITE | PROT_READ))
@@ -1706,6 +1729,17 @@ VirtualProtect(
         {
             *lpflOldProtect = PAGE_EXECUTE_READWRITE;
         }
+
+               //TODO: Linux-specific?
+               //TODO: Is protection check correct?
+               int advise = flNewProtect == PAGE_NOACCESS ? MADV_DONTDUMP : MADV_DODUMP;
+               if (madvise((LPVOID)StartBoundary, MemSize, advise) != 0)
+               {
+                       //TODO: Is it safe to continue here?
+                       SetLastError( ERROR_INVALID_ADDRESS );
+                       goto ExitVirtualProtect;
+               }
+
         bRetVal = TRUE;
     }
     else
janvorli commented 5 years ago

You've made the changes I would expect. So maybe the memory is committed but not mapped to physical memory yet. Is it possible to share your testing app so that I can debug it locally?

6opuc commented 5 years ago

I've created simple app for this test: https://github.com/6opuc/MemoryTest/blob/master/MemoryTest/Program.cs

VM utilization in that app is not that much as in original app, but problem is reproduced: mem usage: ~150MB vm usage: ~3GB createdump -u produces ~3GB core dump createdump (minidump with heap) produces ~200MB

janvorli commented 5 years ago

I've actually found two more places where the madvise madvise with MADV_DONTDUMP should be added. Here, when the VIRTUALMemReset succeeds: https://github.com/dotnet/coreclr/blob/68fad02f41707a5333992cd7701e75aefb4e51c8/src/pal/src/map/virtual.cpp#L1382

And here: https://github.com/dotnet/coreclr/blob/68fad02f41707a5333992cd7701e75aefb4e51c8/src/pal/src/map/virtual.cpp#L1528

janvorli commented 5 years ago

If that doesn't help, I'll look into it on Monday.

6opuc commented 5 years ago

Core dump size is the same after these changes:

diff --git a/src/pal/src/map/virtual.cpp b/src/pal/src/map/virtual.cpp
index a5610ef..e812cda 100644
--- a/src/pal/src/map/virtual.cpp
+++ b/src/pal/src/map/virtual.cpp
@@ -1016,6 +1016,20 @@ static LPVOID ReserveVirtualMemory(
         return nullptr;
     }

+   // Do not include reserved virtual memory in core dumps
+   // TODO: only on linux, because MADV_DONTDUMP is Linux-specific
+   int madviseResult = madvise(pRetVal,
+                    MemSize,
+                    MADV_DONTDUMP);
+   if (madviseResult != 0)
+   {
+       //TODO: should we fail and exit here?
+       ERROR("madvise failed!\n");
+       pthrCurrent->SetLastError(ERROR_INVALID_ADDRESS);
+       munmap(pRetVal, MemSize);
+       return nullptr;
+   }
+
 #if MMAP_ANON_IGNORES_PROTECTION
     if (mprotect(pRetVal, MemSize, PROT_NONE) != 0)
     {
@@ -1169,6 +1183,15 @@ VIRTUALCommitMemory(
                 goto error;
             }

+           // Include committed memory in core dump
+           // TODO: Linux-specific?
+           if (madvise((void *) StartBoundary, MemSize, MADV_DODUMP) != 0)
+           {
+               //TODO: is it safe to continue here?
+               ERROR("madvise() failed! Error(%d)=%s\n", errno, strerror(errno));
+               goto error;
+           }
+
             VIRTUALSetAllocState(MEM_COMMIT, runStart, runLength, pInformation);

             if (nProtect == (PROT_WRITE | PROT_READ))
@@ -1385,6 +1408,18 @@ VirtualAlloc(
             /* Error messages are already displayed, just leave. */
             goto done;
         }
+
+       // Do not include committed memory in core dump
+       // TODO: Linux-specific?
+       // TODO: do we need to align lpAddress and dwSize?
+       UINT_PTR StartBoundary = (UINT_PTR) ALIGN_DOWN(lpAddress, GetVirtualPageSize());
+       SIZE_T MemSize = ALIGN_UP((UINT_PTR)lpAddress + dwSize, GetVirtualPageSize()) - StartBoundary;
+       if (madvise((void *) StartBoundary, MemSize, MADV_DONTDUMP) != 0)
+       {
+           //TODO: is it safe to continue here?
+           ERROR("madvise() failed! Error(%d)=%s\n", errno, strerror(errno));
+           goto done;
+       }
     }

     if ( flAllocationType & MEM_RESERVE )
@@ -1525,6 +1560,16 @@ VirtualFree(
                 goto VirtualFreeExit;
             }
 #endif  // MMAP_ANON_IGNORES_PROTECTION
+
+           // Do not include committed memory in core dump
+           // TODO: Linux-specific?
+           if (madvise((void *) StartBoundary, MemSize, MADV_DONTDUMP) != 0)
+           {
+               //TODO: is it safe to continue here?
+               ERROR("madvise() failed! Error(%d)=%s\n", errno, strerror(errno));
+               goto VirtualFreeExit;
+           }
+

             SIZE_T index = 0;
             SIZE_T nNumOfPagesToChange = 0;
@@ -1706,6 +1751,17 @@ VirtualProtect(
         {
             *lpflOldProtect = PAGE_EXECUTE_READWRITE;
         }
+
+
+       //TODO: Linux-specific?
+       //TODO: Is protection check correct?
+       int advise = flNewProtect == PAGE_NOACCESS ? MADV_DONTDUMP : MADV_DODUMP;
+       if (madvise((LPVOID)StartBoundary, MemSize, advise) != 0)
+       {
+           //TODO: Is it safe to continue here?
+           SetLastError( ERROR_INVALID_ADDRESS );
+           goto ExitVirtualProtect;
+       }
+
         bRetVal = TRUE;
     }
     else
janvorli commented 5 years ago

@6opuc thank you for trying to make these changes. I am going to look into it. I am sorry I haven't done it on Monday as promised, I was sick and then got distracted by an urgent issue.

janvorli commented 5 years ago

I've just tried to apply your changes (not exactly the diff as git somehow complained about it), but making them by hand at the same places. And I got a significant reduction of the core size. Before applying your changes, the core was 2906587496 bytes large, with your changes only 656034600 bytes (about 4.5 times smaller). Still far from the resident set size (~138MB), but it seems promising. Are you sure in your case the dump size was not reduced?

6opuc commented 5 years ago

I've just rechecked that and core dump file size is the same. Here are files I've copied over(maybe something is missing?):

[root@kd Linux.x64.Debug]# tree .
.
├── bin
│   ├── hfa_nested_f32_native_cpp.so
│   ├── hfa_nested_f64_native_cpp.so
│   ├── hfa_simple_f32_native_cpp.so
│   ├── hfa_simple_f64_native_cpp.so
│   ├── jitstructtests_lib.so
│   ├── libBestFitMappingNative.so
│   ├── libBoolNative.so
│   ├── libClassicCOMNative.so
│   ├── libForeignThreadExceptionsNative.so
│   ├── libFuncPtrAsDelegateParamNative.so
│   ├── libFunctionPointerNative.so
│   ├── libIUnknownNative.so
│   ├── libLPSTRTestNative.so
│   ├── libLPTSTRTestNative.so
│   ├── libMarshalArrayByValNative.so
│   ├── libMarshalBoolArrayNative.so
│   ├── libMarshalEnumNative.so
│   ├── libMarshalStructAsParam.so
│   ├── libNativeCallableDll.so
│   ├── libRefCharArrayNative.so
│   ├── libRefIntNative.so
│   ├── libSimpleStructNative.so
│   ├── libSizeConstNative.so
│   ├── libStructABILib.so
│   ├── libUIntPtrNative.so
│   ├── libUTF8TestNative.so
│   ├── libVector3TestNative.so
│   ├── mirror.so
│   ├── native_i0c.so
│   ├── native_i0s.so
│   ├── native_i1c.so
│   ├── native_i1s.so
│   ├── native_i3c.so
│   ├── native_i3s.so
│   ├── native_i5c.so
│   ├── native_i5s.so
│   ├── native_i6c.so
│   ├── native_i6s.so
│   ├── native_i7c.so
│   ├── native_i7s.so
│   ├── native_i8c.so
│   ├── native_i8s.so
│   └── test2.so
├── coreconsole
├── corerun
├── createdump
├── crossgen
├── gcinfo
│   └── gcinfoencoder.cpp
├── IL
│   └── System.Private.CoreLib.dll
├── ilasm
├── ildasm
├── inc
│   ├── cfi.h
│   ├── cordebuginfo.h
│   ├── coredistools.h
│   ├── corerror.h
│   ├── cor.h
│   ├── corhdr.h
│   ├── corinfo.h
│   ├── corjit.h
│   ├── corjithost.h
│   ├── corprof.h
│   ├── gcinfoencoder.h
│   ├── gcinfotypes.h
│   ├── opcode.def
│   └── openum.h
├── lib
│   ├── libcoreclrpal.a
│   ├── libcorguids.a
│   ├── libeventpipe.a
│   ├── libeventprovider.a
│   └── libpalrt.a
├── libclrgc.so
├── libclrjit.so
├── libcoreclr.so
├── libcoreclrtraceptprovider.so
├── libdbgshim.so
├── libmscordaccore.so
├── libmscordbi.so
├── libprotononjit.so
├── libsosplugin.so
├── libsos.so
├── libsuperpmi-shim-collector.so
├── libsuperpmi-shim-counter.so
├── libsuperpmi-shim-simple.so
├── Loader
│   └── NativeLibs
│       └── FromNativePaths_lib.so
├── mcs
├── PDB
│   ├── SOS.NETCore.pdb
│   └── System.Private.CoreLib.pdb
├── sosdocsunix.txt
├── SOS.NETCore.dll
├── SOS.NETCore.pdb
├── superpmi
├── System.Globalization.Native.a
├── System.Globalization.Native.so
├── System.Private.CoreLib.dll
└── System.Private.CoreLib.ni.{5ffab924-2d16-404d-9a84-e24716144db1}.map
janvorli commented 5 years ago

Looking at your diff and comparing it to my one, it seems your change is missing the VirtualReset case. I've mentioned the following location above: https://github.com/dotnet/coreclr/blob/68fad02f41707a5333992cd7701e75aefb4e51c8/src/pal/src/map/virtual.cpp#L1382

However, I've ended up making it here instead: https://github.com/dotnet/coreclr/blob/68fad02f41707a5333992cd7701e75aefb4e51c8/src/pal/src/map/virtual.cpp#L854

by adding int madviseResult = madvise((void*)StartBoundary, MemSize, MADV_DONTDUMP);

6opuc commented 5 years ago

I've just moved madvise(..., MADV_DONTDUMP) from VirtualAlloc(VirtualReset case) to VIRTUALResetMemory(as in your last comment) and rechecked core dump size: still the same 3GB :( I've added asserts to check if my changes in coreclr were used in my test app and asserts work as expected. Anyway, if you've got such core size reduction after those changes, i hope that the problem is only in my build environment(I'm using ubuntu:14.04 docker image for coreclr build with llvm-3.9, and host machine is oracle linux). I'll try to build from recommended docker images: https://github.com/dotnet/coreclr/blob/68fad02f41707a5333992cd7701e75aefb4e51c8/Documentation/building/linux-instructions.md

6opuc commented 5 years ago

Still no luck with core dump size. I've added logging for all madvise(...) calls with WARN(MADV_DODUMP|MADV_DONTDUMP), enabled logging with export PAL_DBG_CHANNELS=-all.all:+VIRTUAL.ENTRY:+VIRTUAL.WARN:+VIRTUAL.ERROR and ran test app again. There is huge amount of VirtualAlloc(...) calls with flAllocationType=MEM_COMMIT: log.txt

Here are my changes for https://github.com/dotnet/coreclr/tree/73484d4664d75aafaaccbbd86f8204bd4f106ae8 : virtual.cpp.diff.txt

janvorli commented 5 years ago

@6opuc to make sure we are making the same changes in the same source code version, the ideal way would be if you've forked coreclr, cloned coreclr repo, made a branch with the changes and pushed it. Then I could fetch it from your fork. It would be much easier than sharing diffs. Or, I can share my branch with my quick and dirty changes the same way with you if you prefer.

janvorli commented 5 years ago

Actually, I was making the changes in the master, not in the 2.2 branch, so maybe there were some changes that cause the difference.

6opuc commented 5 years ago

I've pushed my changes into https://github.com/6opuc/coreclr/tree/release/2.2-MADV_DONTDUMP. New logs(almost the same as previous): log.txt.

Also i've noticed in the logs that first call to ReserveVirtualMemory always fail with: ERROR [VIRTUAL] at ReserveVirtualMemory.1017: We did not get the region we asked for from mmap!

janvorli commented 5 years ago

Thank you for sharing your branch. I have compiled coreclr directly from your branch without any additional changes, made it go through 99 iterations and then attached gdb and captured the core. My core size was 803787600 (785MB). I've rebuilt it again from your branch, but using the previous commit (effectively removing your change). And the core size was 3215852168 (3140 MB). So I am seeing a substantial reduction with your change.

6opuc commented 5 years ago

I used createdump -u to create core dumps. I captured core dump just after first iteration. Is it expected behavior when all reserved virtual memory is included in "full" core dump? As I remember, in windows full dumps are not so huge. If it is expected, then how should I create core dump with all metadata(https://github.com/dotnet/diagnostics/issues/56) and full managed heap without reserved vm?

janvorli commented 5 years ago

Ah, I didn't know you have used the createdump. That explains it. It need to teach the tool to not to save memory that's only reserved and ideally also honor the madvise hints.

After we get in your changes, you can use gcore tool to create smaller core dumps on linux until we fix the createdump. The size of core you get using gcore matches what I was getting using gdb and also to what would the OS dump when it generates core.

janvorli commented 5 years ago

Alternatively, you can get your own createdump built out of coreclr repo with a little fix that I've just tried and that reduces the dump size 10 times and it seems to be debuggable under lldb.

You can patch the code here: https://github.com/dotnet/coreclr/blob/master/src/debug/createdump/crashinfo.cpp#L192-L196 as follows:

         for (const MemoryRegion& region : m_otherMappings)
         {
-            InsertMemoryBackedRegion(region);
+            if ((region.Permissions() & (PF_R | PF_W | PF_X)) != 0)
+            {
+                InsertMemoryBackedRegion(region);
+            }
         }
6opuc commented 5 years ago

Ok, it's all clear for me now, thanks!

Core dump, captured by gcore, is ~200MB with ~150MB resident size for my app. If we can "teach" createdump to make such small dumps, then everybody will be happy. But:

  1. coredump, created by gcore, is useless for me because of dotnet/diagnostics#56
  2. coredump, created by gcore, was also small without MADV_DONTDUMP(as i remember, size was the same as for "minidump with heap", created by createdump utility. I'll check that tomorrow)
  3. when i try to create dump with gcore i get warnings:
    
    root@kd:/projects/coreclr# gcore -o /projects/core 128011

Program received signal SIGTSTP, Stopped (user). [New LWP 128105] [New LWP 128020] [New LWP 128019] [New LWP 128018] [New LWP 128017] [New LWP 128016] [New LWP 128015] [New LWP 128014] [New LWP 128013] [New LWP 128012] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 0x00007f08ae67e3ad in read () at ../sysdeps/unix/syscall-template.S:81 81 ../sysdeps/unix/syscall-template.S: No such file or directory. warning: Memory read failed for corefile section, 12288 bytes at 0x7ffd81956000. Saved corefile /projects/core.128011

4. when i try to open coredump, created by gcore, i get errors(maybe because of different version of sos, i'll check that tomorrow too):

[root@kd coreclr]# docker run --rm -it -v /projects/MemoryTest/MemoryTest/output:/projects/MemoryTest/MemoryTest/output -e COREDUMP_PATH=/projects/MemoryTest/MemoryTest/output/coredump 6opuc/lldb-netcore:2.2.3 (lldb) target create "/usr/bin/dotnet" --core "/projects/MemoryTest/MemoryTest/output/coredump" Core file '/projects/MemoryTest/MemoryTest/output/coredump' (x86_64) was loaded. (lldb) plugin load /coreclr/libsosplugin.so (lldb) sos PrintException -lines The libcoreclr.so module is not loaded yet in the target process


5. i'm not a [real expert in linux/windows/dotnet_internals](http://lurkmore.to/%D0%AF_%D0%BD%D0%B5_%D0%BD%D0%B0%D1%81%D1%82%D0%BE%D1%8F%D1%89%D0%B8%D0%B9_%D1%81%D0%B2%D0%B0%D1%80%D1%89%D0%B8%D0%BA), but as an avarage user of dotnet framework on windows platform i'm used to trust the standard tools for OS: windows error reporting, task managers's "create process dump". And it is very unusual for me to use custom tools to create process dumps(i mean createdump utility). 

I think, that if it is not possible to create "usable" core dumps with standard tools on Linux, then these changes(MADV_DONTDUMP) are useless without changing createdump utility. I'll look into createdump sources tomorrow and will try to fix it.

Should I create pull request for all these changes?
6opuc commented 5 years ago

You can patch the code here:

Thanks! I'll try that tomorrow

janvorli commented 5 years ago

I actually never use coredump and always debug core files generated by the OS. So it is strange that you have issues with that. I'll try what you've done locally to see.

6opuc commented 5 years ago

@janvorli, thanks a lot! After that change in crashinfo.cpp, coredump size is almost as small as process RSS. Is it possible to make these changes(virtual.cpp and crashinfo.cpp) in release 2.2 branch and rebuild dotnet docker images(runtime and sdk)?

6opuc commented 5 years ago

And as for gcore core dumps(just FYI):

  1. gcore prints warnings while writing core dump, but
  2. core dump is debuggable in lldb
  3. core dump size is small even without MADV_DONTDUMP fixes in virtual.cpp
janvorli commented 5 years ago

Is it possible to make these changes(virtual.cpp and crashinfo.cpp) in release 2.2 branch and rebuild dotnet docker images(runtime and sdk)?

If you are asking if Microsoft can publish updated images, then the answer is that it will need to wait for the next update release. This change also needs to be verified more extensively to make sure that the dump is not missig something. I did only a couple of tests (disass of managed code, disass of native code, dumping objects in the managed heap) which I believe should be sufficient, but I am not sure.

core dump size is small even without MADV_DONTDUMP fixes in virtual.cpp

This is strange, on my machine, it was not that way - without your change, I was getting 4x larger dumps with the gcore or gdb. Is your /proc/self/coredump_filter still set to 33?

I was also not getting any warnings while writing the dump except for the benign ../sysdeps/unix/syscall-template.S: No such file or directory.:

 sudo gcore 23929
[New LWP 23930]
[New LWP 23931]
[New LWP 23932]
[New LWP 23933]
[New LWP 23934]
[New LWP 23935]
[New LWP 23936]
[New LWP 23938]
[New LWP 26068]
[New LWP 26127]
[New LWP 26128]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fdcbc0d451d in read () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
Saved corefile core.23929
6opuc commented 5 years ago

Yes, /proc/self/coredump_filter is still set to 33. As for warning ../sysdeps/unix/syscall-template.S: No such file or directory from gcore - i think, that it is bacause of my environment(https://sourceware.org/ml/gdb/2016-01/msg00021.html)

janvorli commented 5 years ago

That warning is completely benign, the gdb (that the gcore executes to do its work) just says that you don't have source code for the current location in the code where the process was stopped for the dump.

janvorli commented 5 years ago

Yes, /proc/self/coredump_filter is still set to 33.

Hmm, strange. I was getting about 3GB large dumps without your change and about 800MB large ones with it. Can you please try to take the dump after 100 iterations of your test? That's what I was doing.

6opuc commented 5 years ago

Thanks for clarification about that warning.

I've run my test app+gcore inside docker container and on my host machine and result were different:

  1. Without MADV_DONTDUMP: 1.1. host machine: 2.6GB 1.2. docker container(ubuntu:14.04): 156MB
  2. With MADV_DONTDUMP: 2.1. host machine: 670MB 2.2. docker container(ubuntu:14.04): 200MB

So it turns out that we have similar results in core sizes.

And gcore produces warnings only inside docker container(maybe it is the reason of diffrence in core sizes)

6opuc commented 5 years ago

Is there any plans to include those changes with MADV_DONTDUMP in next releases/updates?

janvorli commented 5 years ago

@6opuc - I think it would be great. Do you want to create a PR for the change or would you prefer me doing that?

6opuc commented 5 years ago

@janvorli - Sorry for delay. Here it is: dotnet/coreclr#27089 It seems, that changes for createdump utility(createdump/crashinfo.cpp) are already in master. So this PR is only for src/pal/src/map/virtual.cpp

janvorli commented 5 years ago

@6opuc great, thank you! I am sorry for a delayed response, I was on vacation for the last 7 days.

6opuc commented 5 years ago

@janvorli Thanks!

seriouz commented 3 years ago

Any news on this? image

janvorli commented 3 years ago

@seriouz what is the problem you are having? The previous issue with too large core dump should have been in Oct 2019 and they are in .NET since 5.0. In general, there should be nothing wrong with large virtual address space usage.

michaelkarlcoleman commented 2 years ago

@janvorli One continuing problem with the huge virtual address space is that some systems come with a "modest" ulimit -H -v, and the user in question might not be able to increase this. I encountered this today, discovering that 50G virtual was insufficient to run a (small) program. (I vaguely recall that some JVMs had this behavior as well, but were changed to avoid it, since it causes problems in practice.)

It's kind of like files with holes. Yes, it's proper POSIX. But at the same time, a piece of software that gratuitously created such files would not be popular.

janvorli commented 2 years ago

@michaelkarlcoleman thank you for the details. Do you happen to know what is the motivation for limiting the virtual address space? I have hard time coming up with a reason for that. It is a per process thing and the cost of that is just the size of the page tables.