dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.15k stars 4.71k forks source link

Native aot'ed shared libraries should not be unloadable on unix #64629

Open webczat opened 2 years ago

webczat commented 2 years ago

Coreclr/CoreRT are not designed to be unloaded without process exit. In case of anative aot'ed shared library, it's possible it will be used in dynamic loading/plugin scenarios, so it might be loaded and then possibly unloaded by the plugin host, causing process to malfunction/crash. This has been fixed for windows (commit 7b6c0025482aaf8476a6f4ecd357e89a0516ce47) by pinning the library in memory so that it cannot be unloaded, however there is no such fix for unix. This is still marked as a TODO in file https://github.com/dotnet/runtime/blob/main/src/tests/nativeaot/SmokeTests/SharedLibrary/SharedLibrary.cpp#L79

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

webczat commented 2 years ago

cc @MichalStrehovsky If this is not a dupe I would be happy to fix it.

MichalStrehovsky commented 2 years ago

I've looked a bit more into this with the suggestion from https://github.com/dotnet/corert/pull/7011#discussion_r350326064.

If we want to use dlopen with RTLD_NODELETE, we need to get a path to the library first. We would probably want to do all of it around here: https://github.com/dotnet/runtime/blob/b2dc37ba181a7fa4427e717eab819ba3543d0ae4/src/coreclr/nativeaot/Bootstrap/main.cpp#L171-L174

webczat commented 2 years ago

the commit mentioned above does this in a function named PalGetModuleHandleFromPointer. I have looked at the unix equivalent of this function and it uses dladdr to fetch the handle, and the same function also gives the library pathname. Although doing it there looks a bit out of place to me, we have a precedent of using dladdr in the codebase for unix context, so I could either use dladdr in the place you mentioned or possibly just dlopen/dlclose when the above mentioned function is called. dlclose is needed so that refcount doesn't increase needlessly. The only thing I don't know is whether OSx allows what linux does, adding a nodelete flag to an already loaded library, which I tested works on linux.

webczat commented 2 years ago

I know it's not directly related to the issue but before trying to fix things in any way, I anted to first have a test case showing the fix was effective.

I have tried to uncomment the dlclose line in src/tests/nativeaot/SmokeTests/SharedLibrary/SharedLibrary.cpp but it seems like on linux/glibc the library does not get unloaded, even though it does not have nodelete flag. I have no idea why this is happening so I wouldn't rely on that as it might be specific to linux or glibc, or might even be a coincidence. I have also verified that I can still call the library functions after dlclose.

I know that in case of a plain program not linked to anything except standard libs and libdl, during loading/dlopening the native aoted shared library, what is also loaded is librt, libpthread and then dlopened libicu. rt and pthread are marked nodelete during loading, but I don't see any traces of SharedLibrary.so being marked nodelete.

When searching for the possible problem causes one can find https://stackoverflow.com/questions/38869657/dlclose-not-unloading-so-file-which-is-linking-to-boost but IMO this might not be the same problem as, if I am not mistaken, the text of the top answer suggests that the dlopened library should be marked nodelete automatically in this case, yet it seems not happening here.

I have tried to reproduce things by creating a program dlopening a library which dlopens (and leaves opened) another library. Then the first library is closed and gets unloaded even when I also link it with libpthread and librt, which I confirm are loaded too.

Any clue what may cause this? I doubt it's just the corert suddenly not being unloadable.

MichalStrehovsky commented 2 years ago

I guess it's good news that it won't unload, but I don't have a good idea why either.

webczat commented 2 years ago

I don't thing it's good because we don't know why and that condition might suddenly become false, or be false on musl or on mac or even on a different compiler. But I have confirmed that the library cannot be unloaded via repeatedly dlclosing it.

DeafMan1983 commented 1 year ago

I am sorry @MichalStrehovsky You mean library can't unload. Is it possible dlclose or other method under unix-like functions. Or possible Unload, Reload....

webczat commented 1 year ago

the intention is they are not unloadable at all, also with dlclose.

MichalStrehovsky commented 1 year ago

Yup, unloading them at this point is equivalent to pulling the rug from under it. There will still be treads running various things and their code/static data just got unloaded. Nothing good comes out of that. Even if it doesn't crash, we leaked a bunch of memory and handles.