dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.05k stars 4.69k forks source link

crash with 0x8007000E error on .NET 7.0 #85556

Open AlphaBs opened 1 year ago

AlphaBs commented 1 year ago

Description

dotnet --version, dotnet build commands crash with 0x8007000E error.

ubuntu@localhost:~$ dotnet --version
GC heap initialization failed with error 0x8007000E
Failed to create CoreCLR, HRESULT: 0x8007000E

Reproduction Steps

Install dotnet 7.0 sdk with dotnet_install.sh script and run dotnet --version command on terminal

Expected behavior

print dotnet version

Actual behavior

ubuntu@localhost:~$ dotnet --version
GC heap initialization failed with error 0x8007000E
Failed to create CoreCLR, HRESULT: 0x8007000E

I also run command with strace: stracelog.txt

Regression?

works perfectly on .NET 6.0.4 SDK

Known Workarounds

I solved the problem with this

ubuntu@localhost:~/.dotnet$ DOTNET_GCHeapHardLimit=1C0000000 dotnet --version
7.0.203

However why this occurs on .NET 7.0? On .NET 6.0 works without any problem

Configuration

ubuntu jammy, ARM64, run on Termux (Android 13)

Other information

ubuntu@localhost:~/.dotnet$ ulimit -v
unlimited                                                           
ubuntu@localhost:~/.dotnet$ cat /proc/meminfo
MemTotal:        7475488 kB                                          MemFree:          214228 kB
MemAvailable:    1438284 kB                                          Buffers:             756 kB
Cached:          1464412 kB                                          SwapCached:        22804 kB
Active:          1207476 kB                                          Inactive:        1771804 kB
Active(anon):     639508 kB                                          Inactive(anon):  1270248 kB
Active(file):     567968 kB                                          Inactive(file):   501556 kB
Unevictable:       19004 kB                                          Mlocked:           16752 kB
RbinTotal:        327680 kB                                          RbinAlloced:        7168 kB
RbinPool:              0 kB                                          RbinFree:             80 kB
RbinCached:       320432 kB                                          ZeroedFree:            0 kB
SwapTotal:       4194300 kB                                          SwapFree:        1015972 kB
Dirty:               620 kB                                          Writeback:             0 kB
AnonPages:       1845132 kB                                          Mapped:           902216 kB
Shmem:             60352 kB                                          KReclaimable:     315004 kB                                          Slab:             589340 kB
SReclaimable:     146800 kB                                          SUnreclaim:       442540 kB
KernelStack:       97984 kB
ShadowCallStack:   24532 kB
PageTables:       209516 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     7768204 kB
Committed_AS:   238931928 kB
VmallocTotal:   263061440 kB                                         VmallocUsed:      257124 kB
VmallocChunk:          0 kB                                          Percpu:            12544 kB
AnonHugePages:         0 kB                                          ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB                                          FileHugePages:         0 kB
FilePmdMapped:         0 kB                                          HugepagePool:          0 kB
CmaTotal:         487424 kB                                          CmaFree:            4976 kB
dma_heap_pool:    104732 kB                                          system:           569356 kB
kgsl_pool:         63656 kB
KgslSharedmem:   1044500 kB
zram0:            850656 kB
ubuntu@localhost:~/.dotnet$ dotnet --info
GC heap initialization failed with error 0x8007000E
Failed to create CoreCLR, HRESULT: 0x8007000E

Host:
  Version:      7.0.5
  Architecture: arm64
  Commit:       8042d61b17
                                                                     .NET SDKs installed:
  7.0.203 [/home/ubuntu/.dotnet/sdk]                                 
.NET runtimes installed:                                               Microsoft.AspNetCore.App 7.0.5 [/home/ubuntu/.dotnet/shared/Microsoft.AspNetCore.App]                                                     Microsoft.NETCore.App 7.0.5 [/home/ubuntu/.dotnet/shared/Microsoft.NETCore.App]                                                         
Other architectures found:                                             None
                                                                     Environment variables:
  Not set                                                            
global.json file:                                                      Not found
                                                                     Learn more:                                                            https://aka.ms/dotnet/info

Download .NET:
  https://aka.ms/dotnet/download
ubuntu@localhost:~/.dotnet$ dotnet --version
GC heap initialization failed with error 0x8007000E                  Failed to create CoreCLR, HRESULT: 0x8007000E
ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details
### Description `dotnet --version`, `dotnet build` commands crash with 0x8007000E error. ``` ubuntu@localhost:~$ dotnet --version GC heap initialization failed with error 0x8007000E Failed to create CoreCLR, HRESULT: 0x8007000E ``` ### Reproduction Steps Install dotnet 7.0 sdk with `dotnet_install.sh` script and run `dotnet --version` command on terminal ### Expected behavior print dotnet version ### Actual behavior ``` ubuntu@localhost:~$ dotnet --version GC heap initialization failed with error 0x8007000E Failed to create CoreCLR, HRESULT: 0x8007000E ``` I also run command with strace: [stracelog.txt](https://github.com/dotnet/runtime/files/11358910/stracelog.txt) ### Regression? works perfectly on .NET 6.0.4 SDK ### Known Workarounds I solved the problem with [this](https://github.com/dotnet/runtime/issues/79612#issuecomment-1352378682) ``` ubuntu@localhost:~/.dotnet$ DOTNET_GCHeapHardLimit=1C0000000 dotnet --version 7.0.203 ``` However why this occurs on .NET 7.0? On .NET 6.0 works without any problem ### Configuration ubuntu jammy, ARM64, run on Termux (Android 13) ### Other information ``` ubuntu@localhost:~/.dotnet$ ulimit -v unlimited ubuntu@localhost:~/.dotnet$ cat /proc/meminfo MemTotal: 7475488 kB MemFree: 214228 kB MemAvailable: 1438284 kB Buffers: 756 kB Cached: 1464412 kB SwapCached: 22804 kB Active: 1207476 kB Inactive: 1771804 kB Active(anon): 639508 kB Inactive(anon): 1270248 kB Active(file): 567968 kB Inactive(file): 501556 kB Unevictable: 19004 kB Mlocked: 16752 kB RbinTotal: 327680 kB RbinAlloced: 7168 kB RbinPool: 0 kB RbinFree: 80 kB RbinCached: 320432 kB ZeroedFree: 0 kB SwapTotal: 4194300 kB SwapFree: 1015972 kB Dirty: 620 kB Writeback: 0 kB AnonPages: 1845132 kB Mapped: 902216 kB Shmem: 60352 kB KReclaimable: 315004 kB Slab: 589340 kB SReclaimable: 146800 kB SUnreclaim: 442540 kB KernelStack: 97984 kB ShadowCallStack: 24532 kB PageTables: 209516 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 7768204 kB Committed_AS: 238931928 kB VmallocTotal: 263061440 kB VmallocUsed: 257124 kB VmallocChunk: 0 kB Percpu: 12544 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB HugepagePool: 0 kB CmaTotal: 487424 kB CmaFree: 4976 kB dma_heap_pool: 104732 kB system: 569356 kB kgsl_pool: 63656 kB KgslSharedmem: 1044500 kB zram0: 850656 kB ubuntu@localhost:~/.dotnet$ dotnet --info GC heap initialization failed with error 0x8007000E Failed to create CoreCLR, HRESULT: 0x8007000E Host: Version: 7.0.5 Architecture: arm64 Commit: 8042d61b17 .NET SDKs installed: 7.0.203 [/home/ubuntu/.dotnet/sdk] .NET runtimes installed: Microsoft.AspNetCore.App 7.0.5 [/home/ubuntu/.dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 7.0.5 [/home/ubuntu/.dotnet/shared/Microsoft.NETCore.App] Other architectures found: None Environment variables: Not set global.json file: Not found Learn more: https://aka.ms/dotnet/info Download .NET: https://aka.ms/dotnet/download ubuntu@localhost:~/.dotnet$ dotnet --version GC heap initialization failed with error 0x8007000E Failed to create CoreCLR, HRESULT: 0x8007000E ```
Author: AlphaBs
Assignees: -
Labels: `area-GC-coreclr`, `untriaged`
Milestone: -
EgorBo commented 1 year ago
mmap(NULL, 274877915136, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)

looks like it tries to reserve 256Gb of memory?

danmoseley commented 1 year ago

Does trying with 8.0 preview 3 give the same result?

mangod9 commented 1 year ago

its possible Termux doesnt support a large reservation size, something similar was hit with RISC-V recently. Setting the hardlimit to something smaller makes the GC reserve a smaller size.

mangod9 commented 1 year ago

@janvorli to check if there is a way to figure out the max reservation size for an OS.

AlphaBs commented 1 year ago

Does trying with 8.0 preview 3 give the same result?

yes. same result with same error on .NET 8.0.100-preview.3.23178.7 mmap(NULL, 274877911040, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)

janvorli commented 1 year ago

The max reservation size is also influenced by the virtual memory limit. @AlphaBs can you please check it using ulimit -a bash command? See the "virtual memory" line.

AlphaBs commented 1 year ago

The max reservation size is also influenced by the virtual memory limit. @AlphaBs can you please check it using ulimit -a bash command? See the "virtual memory" line.

ubuntu@localhost:~$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 20
file size                   (blocks, -f) unlimited
pending signals                     (-i) 16382
max locked memory           (kbytes, -l) 64
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 25045
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited
janvorli commented 1 year ago

@mangod9 the /proc/meminfo reports VmallocTotal, which is the total size of vmalloc virtual address space. I can see from the dumped info in this issue that it is ~ 250GB while on my Linux box, it is 32TB. Maybe we could somehow base our maximum reservation limit on that, although it is a kernel side allocation limit.

mangod9 commented 1 year ago

hmm, yeah guess we need to limit it based on the VM alloc limits. @AlphaBs, assume you have a workaround to manually set the heap hard limit for now?

AlphaBs commented 1 year ago

hmm, yeah guess we need to limit it based on the VM alloc limits. @AlphaBs, assume you have a workaround to manually set the heap hard limit for now?

yes. I just added export DOTNET_GCHeapHardLimit=1C0000000 to .bashrc. It works well.

As I tweaked that variable, I found that the limit point was roughly at b26000000. (47882174464 bytes, 47GB)

I also found that this limit is not fixed, but varies from run to run (perhaps depending on memory usage of machine during execution).

here is the result of running the command repeatedly, with DOTNET_GCHeapHardLimit=b26000000`.

ubuntu@localhost:~$ dotnet --version
GC heap initialization failed with error 0x8007000E
Failed to create CoreCLR, HRESULT: 0x8007000E
ubuntu@localhost:~$ free
               total        used        free      shared  buff/cache   available
Mem:         7475488     5316836      345408       44060     1813244     2088116
Swap:        4194300     2724232     1470068
ubuntu@localhost:~$ dotnet --version
7.0.203
ubuntu@localhost:~$ free
               total        used        free      shared  buff/cache   available
Mem:         7475488     5313228      348764       44060     1813496     2091708
Swap:        4194300     2724232     1470068
ubuntu@localhost:~$ dotnet --version
GC heap initialization failed with error 0x8007000E
Failed to create CoreCLR, HRESULT: 0x8007000E
ubuntu@localhost:~$ dotnet --version
GC heap initialization failed with error 0x8007000E
Failed to create CoreCLR, HRESULT: 0x8007000E
ubuntu@localhost:~$ free
               total        used        free      shared  buff/cache   available
Mem:         7475488     5325684      262436       46936     1887368     1957008
Swap:        4194300     2826736     1367564
ubuntu@localhost:~$ dotnet --version
7.0.203
ubuntu@localhost:~$

During the execution the usage of memory:

Screenshot_20230503_153440_Termux.jpg

As you can see, the error doesn't always occur with b26000000 limit. But I don't know what this magic number b26000000 means.

Any ideas??

AlphaBs commented 1 year ago

is it normal? very simple console program consume 220GB virtual memory.

Screenshot_20230503_154050_Termux.jpg

testcsharp.cs

Console.WriteLine("Hello, World!");
int a = 0;
while (true) a++;
janvorli commented 1 year ago

is it normal? very simple console program consume 220GB virtual memory.

It just reserves the virtual address space. Virtual address space is per process, that means that each process in the system can reserve upto those 220GB of that space (on your device). So unless you set the ulimit for the virtual memory, this is essentially "free". The application can then map physical memory into the reserved memory as it needs. The amount of physical memory it has used can be seen in the "RES" column. You can see in your screenshot above that it has used about 23MB of memory.

We reserve the virtual memory so that GC can have continuous range of address space that other memory allocations in the process would not touch.

The reason why you have started seeing this issue in .NET 7 relative to .NET 6 is that we have substantially enlarged the amount of reserved virtual address space because of a new significant enhancement of the GC implementation.

But I don't know what this magic number b26000000 means

This is a hexadecimal number representing number of bytes the GC heap can reserve. In decimal, it means 47882174464 bytes.

I also found that this limit is not fixed, but varies from run to run

There might be some variations depending on other allocations the application and the 3rd party native libraries it uses made. I would recommend using e.g. 2/3 of this value to make it reliable. That would mean setting the env variable e.g. to 700000000 (which is 30064771072 decimal).

stevefan1999-personal commented 1 year ago

Yes. I also see this problem running dotnet under proot on Android. And because of that, C# Dev Kit extension on VSCode Android does not work because we can't supply DOTNET_GCHeapHardLimit.

woachk commented 11 months ago

Android devices ship with a 39-bit VA size across the board.

am11 commented 8 months ago

I am also seeing this error with valgrind --tool=massif when profiling NativeAOT app heap memory. Setting DOTNET_GCHeapHardLimit=1C0000000 fixes the issue (and can visualize the output). Adapting to environment constraints at run-time will certainly improve the user-experience in these chroot -like scenarios.

ps - using un-prefixed hex is not a good choice and confusing for environment variable IMHO. DOTNET_GCHeapHardLimit should accept both 0x<HEX> and <DECIMAL> values, and throw helpful error message for invalid value.

janvorli commented 8 months ago

@am11 the fact that the values are always in hex is a historical thing, so it is hard to change that without breaking someone. However, prefixing the value with 0x works too already. The thing is that we call strtoul to convert the string to value. I didn't know that until few days ago when someone told me it works and re-reading strtoul documentation has uncovered that when you pass in base 16, the 0x can be in the string and it is just skipped.

am11 commented 8 months ago

@janvorli, I tried this simple repro:

FROM --platform=linux/aarch64 alpine:latest

RUN apk add build-base curl clang llvm-dev valgrind bash zlib-dev icu-libs
RUN curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --quality daily --channel "9.0" --install-dir "$HOME/.dotnet9"

RUN ~/.dotnet9/dotnet new console --aot -n consoleapp1
WORKDIR consoleapp1
RUN cat > "$HOME/.nuget/NuGet/NuGet.Config" <<EOF
<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
    <add key="dotnet9" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet9/nuget/v3/index.json" />
  </packageSources>
</configuration>
EOF

RUN ~/.dotnet9/dotnet publish -o dist -c Release

# cache commands

# Fails
RUN echo 'valgrind --tool=massif dist/consoleapp1; echo $?' > run.sh

# Fails
RUN echo 'DOTNET_GCHeapHardLimit=0x1C0000000 valgrind --tool=massif dist/consoleapp1; echo $?' >> run.sh

# Works
RUN echo 'DOTNET_GCHeapHardLimit=1C0000000 valgrind --tool=massif dist/consoleapp1; echo $?' >> run.sh

ENTRYPOINT ["/bin/sh", "run.sh"]

first two valgrind commands always fail with 255, only the last form (without 0x) succeeds:

# build and tag once
$ docker build -t consoleapp1-valgrind .
# run
$ docker run --rm consoleapp1-valgrind

==7== Massif, a heap profiler
==7== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==7== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==7== Command: dist/consoleapp1
==7== 
==7== 
255
==8== Massif, a heap profiler
==8== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==8== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==8== Command: dist/consoleapp1
==8== 
==8== 
255
==9== Massif, a heap profiler
==9== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==9== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==9== Command: dist/consoleapp1
==9== 
Hello, World!
==9== 
0

(Few weeks older version of dotnet 9 was showing GC heap initialization failed with error 0x8007000E, today's build has no error message; just the exit code 255 🤔)