Open AlphaBs opened 1 year ago
Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.
Author: | AlphaBs |
---|---|
Assignees: | - |
Labels: | `area-GC-coreclr`, `untriaged` |
Milestone: | - |
mmap(NULL, 274877915136, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
looks like it tries to reserve 256Gb of memory?
Does trying with 8.0 preview 3 give the same result?
its possible Termux doesnt support a large reservation size, something similar was hit with RISC-V recently. Setting the hardlimit to something smaller makes the GC reserve a smaller size.
@janvorli to check if there is a way to figure out the max reservation size for an OS.
Does trying with 8.0 preview 3 give the same result?
yes. same result with same error on .NET 8.0.100-preview.3.23178.7
mmap(NULL, 274877911040, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
The max reservation size is also influenced by the virtual memory limit. @AlphaBs can you please check it using ulimit -a
bash command? See the "virtual memory" line.
The max reservation size is also influenced by the virtual memory limit. @AlphaBs can you please check it using
ulimit -a
bash command? See the "virtual memory" line.
ubuntu@localhost:~$ ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 25045
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
@mangod9 the /proc/meminfo
reports VmallocTotal
, which is the total size of vmalloc virtual address space. I can see from the dumped info in this issue that it is ~ 250GB while on my Linux box, it is 32TB. Maybe we could somehow base our maximum reservation limit on that, although it is a kernel side allocation limit.
hmm, yeah guess we need to limit it based on the VM alloc limits. @AlphaBs, assume you have a workaround to manually set the heap hard limit for now?
hmm, yeah guess we need to limit it based on the VM alloc limits. @AlphaBs, assume you have a workaround to manually set the heap hard limit for now?
yes. I just added export DOTNET_GCHeapHardLimit=1C0000000
to .bashrc
. It works well.
As I tweaked that variable, I found that the limit point was roughly at b26000000. (47882174464 bytes, 47GB)
I also found that this limit is not fixed, but varies from run to run (perhaps depending on memory usage of machine during execution).
here is the result of running the command repeatedly, with DOTNET_GCHeapHardLimit=b26000000`.
ubuntu@localhost:~$ dotnet --version
GC heap initialization failed with error 0x8007000E
Failed to create CoreCLR, HRESULT: 0x8007000E
ubuntu@localhost:~$ free
total used free shared buff/cache available
Mem: 7475488 5316836 345408 44060 1813244 2088116
Swap: 4194300 2724232 1470068
ubuntu@localhost:~$ dotnet --version
7.0.203
ubuntu@localhost:~$ free
total used free shared buff/cache available
Mem: 7475488 5313228 348764 44060 1813496 2091708
Swap: 4194300 2724232 1470068
ubuntu@localhost:~$ dotnet --version
GC heap initialization failed with error 0x8007000E
Failed to create CoreCLR, HRESULT: 0x8007000E
ubuntu@localhost:~$ dotnet --version
GC heap initialization failed with error 0x8007000E
Failed to create CoreCLR, HRESULT: 0x8007000E
ubuntu@localhost:~$ free
total used free shared buff/cache available
Mem: 7475488 5325684 262436 46936 1887368 1957008
Swap: 4194300 2826736 1367564
ubuntu@localhost:~$ dotnet --version
7.0.203
ubuntu@localhost:~$
During the execution the usage of memory:
As you can see, the error doesn't always occur with b26000000 limit. But I don't know what this magic number b26000000 means.
Any ideas??
is it normal? very simple console program consume 220GB virtual memory.
testcsharp.cs
Console.WriteLine("Hello, World!");
int a = 0;
while (true) a++;
is it normal? very simple console program consume 220GB virtual memory.
It just reserves the virtual address space. Virtual address space is per process, that means that each process in the system can reserve upto those 220GB of that space (on your device). So unless you set the ulimit for the virtual memory, this is essentially "free". The application can then map physical memory into the reserved memory as it needs. The amount of physical memory it has used can be seen in the "RES" column. You can see in your screenshot above that it has used about 23MB of memory.
We reserve the virtual memory so that GC can have continuous range of address space that other memory allocations in the process would not touch.
The reason why you have started seeing this issue in .NET 7 relative to .NET 6 is that we have substantially enlarged the amount of reserved virtual address space because of a new significant enhancement of the GC implementation.
But I don't know what this magic number b26000000 means
This is a hexadecimal number representing number of bytes the GC heap can reserve. In decimal, it means 47882174464 bytes.
I also found that this limit is not fixed, but varies from run to run
There might be some variations depending on other allocations the application and the 3rd party native libraries it uses made. I would recommend using e.g. 2/3 of this value to make it reliable. That would mean setting the env variable e.g. to 700000000 (which is 30064771072 decimal).
Yes. I also see this problem running dotnet under proot on Android. And because of that, C# Dev Kit extension on VSCode Android does not work because we can't supply DOTNET_GCHeapHardLimit.
Android devices ship with a 39-bit VA size across the board.
I am also seeing this error with valgrind --tool=massif
when profiling NativeAOT app heap memory. Setting DOTNET_GCHeapHardLimit=1C0000000
fixes the issue (and can visualize the output). Adapting to environment constraints at run-time will certainly improve the user-experience in these chroot -like scenarios.
ps - using un-prefixed hex is not a good choice and confusing for environment variable IMHO. DOTNET_GCHeapHardLimit
should accept both 0x<HEX>
and <DECIMAL>
values, and throw helpful error message for invalid value.
@am11 the fact that the values are always in hex is a historical thing, so it is hard to change that without breaking someone. However, prefixing the value with 0x works too already. The thing is that we call strtoul
to convert the string to value. I didn't know that until few days ago when someone told me it works and re-reading strtoul
documentation has uncovered that when you pass in base 16, the 0x can be in the string and it is just skipped.
@janvorli, I tried this simple repro:
FROM --platform=linux/aarch64 alpine:latest
RUN apk add build-base curl clang llvm-dev valgrind bash zlib-dev icu-libs
RUN curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --quality daily --channel "9.0" --install-dir "$HOME/.dotnet9"
RUN ~/.dotnet9/dotnet new console --aot -n consoleapp1
WORKDIR consoleapp1
RUN cat > "$HOME/.nuget/NuGet/NuGet.Config" <<EOF
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
<add key="dotnet9" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet9/nuget/v3/index.json" />
</packageSources>
</configuration>
EOF
RUN ~/.dotnet9/dotnet publish -o dist -c Release
# cache commands
# Fails
RUN echo 'valgrind --tool=massif dist/consoleapp1; echo $?' > run.sh
# Fails
RUN echo 'DOTNET_GCHeapHardLimit=0x1C0000000 valgrind --tool=massif dist/consoleapp1; echo $?' >> run.sh
# Works
RUN echo 'DOTNET_GCHeapHardLimit=1C0000000 valgrind --tool=massif dist/consoleapp1; echo $?' >> run.sh
ENTRYPOINT ["/bin/sh", "run.sh"]
first two valgrind commands always fail with 255, only the last form (without 0x
) succeeds:
# build and tag once
$ docker build -t consoleapp1-valgrind .
# run
$ docker run --rm consoleapp1-valgrind
==7== Massif, a heap profiler
==7== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==7== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==7== Command: dist/consoleapp1
==7==
==7==
255
==8== Massif, a heap profiler
==8== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==8== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==8== Command: dist/consoleapp1
==8==
==8==
255
==9== Massif, a heap profiler
==9== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==9== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==9== Command: dist/consoleapp1
==9==
Hello, World!
==9==
0
(Few weeks older version of dotnet 9 was showing GC heap initialization failed with error 0x8007000E
, today's build has no error message; just the exit code 255 🤔)
Description
dotnet --version
,dotnet build
commands crash with 0x8007000E error.Reproduction Steps
Install dotnet 7.0 sdk with
dotnet_install.sh
script and rundotnet --version
command on terminalExpected behavior
print dotnet version
Actual behavior
I also run command with strace: stracelog.txt
Regression?
works perfectly on .NET 6.0.4 SDK
Known Workarounds
I solved the problem with this
However why this occurs on .NET 7.0? On .NET 6.0 works without any problem
Configuration
ubuntu jammy, ARM64, run on Termux (Android 13)
Other information