C++ crashes when calling any function in nativeaot shared library

CeSun commented 9 hours ago

The system developer said it was caused by selinux permissions. Is there a way to bypass this system call?

https://github.com/dotnet/runtime/blob/35f2b1309a6380f1c8e3edfe1589307fd2a3e1d1/src/coreclr/gc/unix/numasupport.cpp#L57

syscall Disassembly:201 NUMASupportInitialize() 0x0000005c8b963458 GCToOSInterface::Initialize() 0x0000005c8b962480 ::PalInit() 0x0000005c8b96072c ::RhInitialize(bool) 0x0000005c8b91b1f0 InitializeRuntime() 0x0000005c8b914ebc Thread::EnsureRuntimeInitialized() 0x0000005c8b91d0e8 Thread::ReversePInvokeAttachOrTrapThread(ReversePInvokeFrame*) 0x0000005c8b91d094 libavalonia_Entry_napi_init__RegisterEntryModule napi_init.cs:12 ::RegisterAvaloniaNativeModule() napi_init.cpp:16

dotnet-policy-service[bot] commented 9 hours ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

dotnet-policy-service[bot] commented 9 hours ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

MichalStrehovsky commented 9 hours ago

It doesn't look like there is a way to avoid calling into the NUMA API.

Cc @janvorli @am11 for ideas

Is this with the default SELinux policies or something more locked down?

CeSun commented 8 hours ago

If I modify the source code and return directly in the first line of the NUMASupportInitialize function, will it work? If it is theoretically possible, I will try to invest my energy in learning how to compile the dotnet sdk.

janvorli commented 8 hours ago

@CeSun are you running in a docker container? And what is the distro you are using?

CeSun commented 7 hours ago

@janvorli Hi, Thanks for your reply,

I am using HarmonyOS Next, a new mobile operating system developed by Huawei. This system is similar to Android. And on this system, you can call the native shared library of linux-musl.

Currently, this system is in the public beta stage.

I have two devices, one with enforcing selinux and the other with disabled selinux.

On the device with disabled selinux, the native shared library released by nativeaot works fine, but not on the other.

But in the future, the selinux status of the system used by users will be enforcing

CeSun commented 7 hours ago

I have also posted a work order in the Huawei Developer Center to seek help from Huawei and am waiting for a response.

huoyaoyuan commented 7 hours ago

If I modify the source code and return directly in the first line of the NUMASupportInitialize function, will it work?

It should work as-if there's no NUMA support, like the non TARGET_LINUX path.

CeSun commented 7 hours ago

If I modify the source code and return directly in the first line of the NUMASupportInitialize function, will it work?

It should work as-if there's no NUMA support, like the non TARGET_LINUX path.

I also noticed the macro "TARGET_LINUX", but I know nothing about NUMA. There are no assertions in the source code, so I guess it is logically allowed not to execute NUMA-related initialization code.

huoyaoyuan commented 6 hours ago

NUMA refers to Non-Unified Memory Access, for different physical memory controllers connected with different CPU core(s). Accessing memory or cache connected with different memory controller requires going through the slow interconnect bus, like multiple CPU chips. It's usually not a concern on consumer hardware before Ryzen 9 brings two chiplets. Not initializing NUMA information will just increase the chance of inefficient memory accesses, for HEDT and server CPUs with many memory channels.

janvorli commented 3 hours ago

@CeSun do you know if the crash happened while calling the syscall or at some later point?

am11 commented 3 hours ago

I was testing with:

#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/syscall.h>

int main(void)
{
  if (syscall(__NR_get_mempolicy, NULL, NULL, 0, 0, 0) < 0)
        printf("syscall failed with errno %d: %s\n", errno, strerror(errno));
  else
    printf("didn't fail\n");

  return 0;
}

cc getmempolicy.c && ./a.out

CeSun commented 2 hours ago

@CeSun do you know if the crash happened while calling the syscall or at some later point?

when calling the syscall

CeSun commented 2 hours ago

@am11 In addition, by using tools similar to adb to enter the shell environment, executing the executable program published by nativeoot will not have any problems. Only by accompanying the native binary shared library published by nativeoot with the software package of this system (similar to Android apk) will it crash.

am11 commented 9 minutes ago

@CeSun, I couldn't figure out which "class" to use in SELinux profile, so I went with seccomp security model to repro it. To do that, first the host kernel needs to support get_mempolicy syscall. e.g. the linux host kernel used by docker for mac doesn't support it so I created a fedora VM and installed docker in it). Built the repro (https://github.com/dotnet/runtime/issues/110074#issuecomment-2493869365) in the VM and ran the container with and without the cap:

$ docker run -v$(pwd):/app --rm --cap-add=SYS_NICE fedora /app/a.out
didn't fail

$ docker run -v$(pwd):/app --rm fedora /app/a.out
syscall failed with errno 1: Operation not permitted

With docker mac (whose host doesn't have get_mempolicy syscall), I was getting:

syscall failed with errno 38: Function not implemented

We were handling errno 38 but not 1, so this is somewhat of a corner case (host kernel supports get_mempolicy and container does not enable the capability). The daily build with changes will be out in a few hours or by tomorrow, you can give it a try.

dotnet / runtime

C++ crashes when calling any function in nativeaot shared library #110074