dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.47k stars 4.76k forks source link

.net8 AOT compiled library targeting arm64 crashes when invoked using DllImport from .net8 code #95257

Open anoop331 opened 1 year ago

anoop331 commented 1 year ago

When .net8 AOT compiled linux-arm64 library interface is invoked using DllImport from .net8 CLR code, on linux-arm64 environment, it crashes.

The code where this issue can be reproduced is listed here in the following repo.

https://github.com/anoop331/net8aot

The AOT compilation can be done using the docker file in the repo.

The following table lists down the combinations where it works and it doesn't, all on linux-arm64 target (yocto running on QEMU arm64).

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Invoking Env (linux-arm64) | Lib Compiled Using | Status | Error on terminal -- | -- | -- | -- net7 CLR | net7 | Works |   net7  AOT | net7 | works |   net7 CLR | net8 | Error |   net8 CLR | net7 | works |   net8 CLR | net8 | Error | aborted net8 AOT | net8 | Error | aborted net7 AOT | net8 | Works |   c ++ | net8 | Works |   c ++ | net7 | Works |  

ghost commented 1 year ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Issue Details
When .net8 AOT compiled linux-arm64 library interface is invoked using DllImport from .net8 CLR code, on linux-arm64 environment, it crashes. The code where this issue can be reproduced is listed here in the following repo. https://github.com/anoop331/net8aot The AOT compilation can be done using the docker file in the repo. The following table lists down the combinations where it works and it doesn't, all on linux-arm64 target. Invoking Env | Lib Compiled Using | Status -- | -- | -- net7 CLR | net7 | Works net7  AOT | net7 | works net7 CLR | net8 | Error net8 CLR | net7 | works net8 AOT | net8 | Error net7 AOT | net8 | Works
Author: anoop331
Assignees: -
Labels: `untriaged`, `area-Codegen-AOT-mono`, `area-NativeAOT-coreclr`
Milestone: -
fanyang-mono commented 1 year ago

This seems to be related to NativeAOT rather than Mono AOT.

jkotas commented 1 year ago

crashes

Is there anything printed to the console? What is the exit code?

MichalStrehovsky commented 1 year ago

I can repro this on Raspberry Pi. This FailFasts when the second copy of the runtime tries to initialize. The GC initialization returns a failure. Stepping through it, the GC is mmapping an insane memory range and the mmap fails:

#0  0x0000007f6020117c in GCToOSInterface::VirtualReserve (size=274877906944, alignment=<optimized out>,
    flags=<optimized out>, node=<optimized out>) at /__w/1/s/src/coreclr/gc/unix/gcenv.unix.cpp:570
#1  0x0000007f601c717c in WKS::virtual_alloc (size=274877906944, numa_node=65535, use_large_pages_p=<optimized out>)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:5766
#2  WKS::gc_heap::initialize_gc (soh_segment_size=soh_segment_size@entry=268435456,
    loh_segment_size=loh_segment_size@entry=549755809776, poh_segment_size=poh_segment_size@entry=549755809776)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:14252
#3  0x0000007f601f2404 in WKS::GCHeap::Initialize (this=<optimized out>)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:48484
#4  0x0000007f601b8688 in RedhawkGCInterface::InitializeSubsystems ()
    at /__w/1/s/src/coreclr/nativeaot/Runtime/gcrhenv.cpp:112
#5  0x0000007f601bc420 in InitDLL (hPalInstance=0x7f601b0000) at /__w/1/s/src/coreclr/nativeaot/Runtime/startup.cpp:151
#6  RhInitialize (isDll=<optimized out>) at /__w/1/s/src/coreclr/nativeaot/Runtime/startup.cpp:376
#7  0x0000007f601b6ab8 in InitializeRuntime () at /__w/1/s/src/coreclr/nativeaot/Bootstrap/main.cpp:167
#8  0x0000007f601bd93c in Thread::EnsureRuntimeInitialized (this=0x7ff7ff88f0)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/thread.cpp:1219
#9  Thread::ReversePInvokeAttachOrTrapThread (this=0x7ff7ff88f0, pFrame=0x7ffffff130)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/thread.cpp:1181
#10 0x0000007f60245c6c in aotlib_AotLib_NativeEntryPoints__Add (x=<optimized out>, y=<optimized out>)
    at /home/michals/net8aot/AotLib/Class1.cs:11
#11 0x000000555560ec04 in test_LibTest_Program__add () at /home/michals/net8aot/LibTest/Program.cs:13
#12 0x000000555560eb04 in test_LibTest_Program__Main (args=...) at /home/michals/net8aot/LibTest/Program.cs:11
#13 0x000000555562f098 in test__Module___StartupCodeMain ()
    at /_/src/coreclr/nativeaot/Common/src/System/Collections/Generic/LowLevelDictionary.cs:289
#14 0x0000007ff7e27780 in __libc_start_call_main (main=main@entry=0x5555558bc8 <main(int, char**)>, argc=argc@entry=1,
    argv=argv@entry=0x7ffffff3c8) at ../sysdeps/nptl/libc_start_call_main.h:58

It appears we're asking for 274 GB of address space.

The contents of registers before the call to mmap is:

x0             0x0                 0
x1             0x4000001000        274877911040
x2             0x0                 0
x3             0x22                34
x4             0xffffffff          4294967295
x5             0x0                 0
x6             0x555575b9e0        366505998816
x7             0xea97d327a30d6edb  -1542532180158353701
x8             0x1000              4096
x9             0x0                 0
x10            0x0                 0
x11            0x10000000          268435456
x12            0x6e6f69737365732f  7957695011165139759
x13            0x65706f63732e342d  7309464668147168301
x14            0x1                 1
x15            0x7ff7e7a210        549619999248
x16            0x7f602df8a0        547074472096
x17            0x7ff7e839c0        549620038080
x18            0x12e000            1236992
x19            0x4000000000        274877906944
x20            0x2000              8192
x21            0x7f602e9000        547074510848
x22            0x1000              4096
x23            0x7f602e9000        547074510848
x24            0x0                 0
x25            0x0                 0
x26            0x7ff7ffe028        549621588008
x27            0x55556eeb58        366505552728
x28            0x0                 0

So I think it matches what gdb showed in debug information as the size parameter to GCToOSInterface::VirtualReserve and this is the real number (not a case of bad debug info).

@dotnet/gc is it expected that the GC would try to reserve such huge amount of memory? The raspberry Pi I'm running this on has 8 GB of RAM and 32 GB of total storage (swap is a small fraction of that).

mangod9 commented 1 year ago

correct, with regions the GC tries to reserve 256gb of address space or up to 1/2 of available virtual memory:

https://github.com/dotnet/runtime/blob/3805c174d0a72dadfdbef98011b10b32df9e93f3/src/coreclr/gc/gc.cpp#L48097

do we know how much of virtual address space available on the pi? Maybe for this scenario it has to be less than 1/2?

anoop331 commented 1 year ago

crashes

/ Is there anything printed to the console? What is the exit code?

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

It says "aborted", here is the updated table.

Invoking Env (linux-arm64) | Lib Compiled Using | Status | Error on terminal -- | -- | -- | -- net7 CLR | net7 | Works |   net7  AOT | net7 | works |   net7 CLR | net8 | Error |   net8 CLR | net7 | works |   net8 CLR | net8 | Error | aborted net8 AOT | net8 | Error | aborted net7 AOT | net8 | Works |   c ++ | net8 | Works |   c ++ | net7 | Works |  

anoop331 commented 1 year ago

I can repro this on Raspberry Pi. This FailFasts when the second copy of the runtime tries to initialize. The GC initialization returns a failure. Stepping through it, the GC is mmapping an insane memory range and the mmap fails:

#0  0x0000007f6020117c in GCToOSInterface::VirtualReserve (size=274877906944, alignment=<optimized out>,
    flags=<optimized out>, node=<optimized out>) at /__w/1/s/src/coreclr/gc/unix/gcenv.unix.cpp:570
#1  0x0000007f601c717c in WKS::virtual_alloc (size=274877906944, numa_node=65535, use_large_pages_p=<optimized out>)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:5766
#2  WKS::gc_heap::initialize_gc (soh_segment_size=soh_segment_size@entry=268435456,
    loh_segment_size=loh_segment_size@entry=549755809776, poh_segment_size=poh_segment_size@entry=549755809776)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:14252
#3  0x0000007f601f2404 in WKS::GCHeap::Initialize (this=<optimized out>)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/../../gc/gc.cpp:48484
#4  0x0000007f601b8688 in RedhawkGCInterface::InitializeSubsystems ()
    at /__w/1/s/src/coreclr/nativeaot/Runtime/gcrhenv.cpp:112
#5  0x0000007f601bc420 in InitDLL (hPalInstance=0x7f601b0000) at /__w/1/s/src/coreclr/nativeaot/Runtime/startup.cpp:151
#6  RhInitialize (isDll=<optimized out>) at /__w/1/s/src/coreclr/nativeaot/Runtime/startup.cpp:376
#7  0x0000007f601b6ab8 in InitializeRuntime () at /__w/1/s/src/coreclr/nativeaot/Bootstrap/main.cpp:167
#8  0x0000007f601bd93c in Thread::EnsureRuntimeInitialized (this=0x7ff7ff88f0)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/thread.cpp:1219
#9  Thread::ReversePInvokeAttachOrTrapThread (this=0x7ff7ff88f0, pFrame=0x7ffffff130)
    at /__w/1/s/src/coreclr/nativeaot/Runtime/thread.cpp:1181
#10 0x0000007f60245c6c in aotlib_AotLib_NativeEntryPoints__Add (x=<optimized out>, y=<optimized out>)
    at /home/michals/net8aot/AotLib/Class1.cs:11
#11 0x000000555560ec04 in test_LibTest_Program__add () at /home/michals/net8aot/LibTest/Program.cs:13
#12 0x000000555560eb04 in test_LibTest_Program__Main (args=...) at /home/michals/net8aot/LibTest/Program.cs:11
#13 0x000000555562f098 in test__Module___StartupCodeMain ()
    at /_/src/coreclr/nativeaot/Common/src/System/Collections/Generic/LowLevelDictionary.cs:289
#14 0x0000007ff7e27780 in __libc_start_call_main (main=main@entry=0x5555558bc8 <main(int, char**)>, argc=argc@entry=1,
    argv=argv@entry=0x7ffffff3c8) at ../sysdeps/nptl/libc_start_call_main.h:58

It appears we're asking for 274 GB of address space.

The contents of registers before the call to mmap is:

x0             0x0                 0
x1             0x4000001000        274877911040
x2             0x0                 0
x3             0x22                34
x4             0xffffffff          4294967295
x5             0x0                 0
x6             0x555575b9e0        366505998816
x7             0xea97d327a30d6edb  -1542532180158353701
x8             0x1000              4096
x9             0x0                 0
x10            0x0                 0
x11            0x10000000          268435456
x12            0x6e6f69737365732f  7957695011165139759
x13            0x65706f63732e342d  7309464668147168301
x14            0x1                 1
x15            0x7ff7e7a210        549619999248
x16            0x7f602df8a0        547074472096
x17            0x7ff7e839c0        549620038080
x18            0x12e000            1236992
x19            0x4000000000        274877906944
x20            0x2000              8192
x21            0x7f602e9000        547074510848
x22            0x1000              4096
x23            0x7f602e9000        547074510848
x24            0x0                 0
x25            0x0                 0
x26            0x7ff7ffe028        549621588008
x27            0x55556eeb58        366505552728
x28            0x0                 0

So I think it matches what gdb showed in debug information as the size parameter to GCToOSInterface::VirtualReserve and this is the real number (not a case of bad debug info).

@dotnet/gc is it expected that the GC would try to reserve such huge amount of memory? The raspberry Pi I'm running this on has 8 GB of RAM and 32 GB of total storage (swap is a small fraction of that).

I have updated the table and the repo on the issue as well, I can add that, when the native interface is invoked from c++ code, it works fine. So of course the issue is when multiple "runtimes" are running, one on the lib, and one from the invoking code. The issue is only arm64 , on x64 (linux and windows), it works fine.

MichalStrehovsky commented 1 year ago

@anoop331 you can work around this by configuring the garbage collector not to reserve insane amounts of address space. For example, setting the DOTNET_GCHeapHardLimitPercent=8 environment variable before starting the process fixes things for me. (See more at GC configuration options to set heap limits at https://learn.microsoft.com/en-us/dotnet/core/runtime-config/garbage-collector. You can also hardcode this at compile time by setting <ItemGroup><RuntimeHostConfigurationOption Include="System.GC.HeapHardLimitPercent" Value="8" /></ItemGroup> in the csproj)

@anoop331 what machine are you running this on? Trying to see if both of us tried Raspberry Pi, or if this is a more general Arm64 Linux issue.

anoop331 commented 1 year ago

@MichalStrehovsky Thanks will try out the work around. We are running on a custom built yocto distribution that runs on QEMU. So it seems like a general arm64 issue. It seems to work fine on x64 linux distros.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details
When .net8 AOT compiled linux-arm64 library interface is invoked using DllImport from .net8 CLR code, on linux-arm64 environment, it crashes. The code where this issue can be reproduced is listed here in the following repo. https://github.com/anoop331/net8aot The AOT compilation can be done using the docker file in the repo. The following table lists down the combinations where it works and it doesn't, all on linux-arm64 target (yocto running on QEMU arm64). Invoking Env (linux-arm64) | Lib Compiled Using | Status | Error on terminal -- | -- | -- | -- net7 CLR | net7 | Works |   net7  AOT | net7 | works |   net7 CLR | net8 | Error |   net8 CLR | net7 | works |   net8 CLR | net8 | Error | aborted net8 AOT | net8 | Error | aborted net7 AOT | net8 | Works |   c ++ | net8 | Works |   c ++ | net7 | Works |  
Author: anoop331
Assignees: -
Labels: `area-GC-coreclr`, `untriaged`
Milestone: -
MichalStrehovsky commented 1 year ago

Thank you for confirming! I'm moving this to the GC area path and the GC team will need to take this from here.

mangod9 commented 1 year ago

adding @janvorli as well since we have done some recent changes related to VirtualMemoryLimits. Would be interesting to run with those changes to check if there is a behavior difference here.

anoop331 commented 12 months ago

@MichalStrehovsky The fix you proposed worked, both running with CLR mode and AOT compiled client code. I have updated my repo with the fix as well https://github.com/anoop331/net8aot. Thanks a lot.

mangod9 commented 4 months ago

I am guessing manually configuring the DOTNET_GCRegionRange to be something smaller should be the preferred way to work around this issue.

@anoop331, assume that is a satisfactory workaround for you and we can close this issue?