eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 722 forks source link

[FFI/Jextract] Failure to load a complex archive file (.a) on AIX in the jextract generation #19930

Closed ChengJin01 closed 1 month ago

ChengJin01 commented 3 months ago

The issue was detected when generating the jextract tool for FFI after the issue with native library loading (.a) was resolved (which is entirely different from the original problem being addressed in https://github.com/eclipse-openj9/openj9/issues/19344).

As explained in https://github.com/openjdk/jextract, the generation of the jextract tool requires the LLVM libraries in place, part of which (e.g. libclang.a) must be loaded by JDK to exploit the native functions in building the tool where the loading failure occurs in calling dlopen (see https://github.com/eclipse-openj9/openj9/issues/19344#issuecomment-2253523393 for details), which most likely happens to the code around there at https://github.com/eclipse-openj9/openj9-omr/blob/9083c8237ac215927ac55b5db256780132983136/port/aix/omrsl.c#L216).

Technically, the existing code dealing with dlopen currently only works to load a simple shared object (extremely simple format) suffixed with .a but fails to support a complex archive file combined with many shared objects (like libc.a or libclang.a in LLVM). To address the problem, we need to figure out how dlopen works to load these archives at first, especially in the case of libc.a/libclang.a.

FYI: @TobiAjila, @pshipton, @JasonFengJ9, @zl-wang, @keithc-ca

ChengJin01 commented 3 months ago

@zl-wang, is there any way to reach out to the AIX team to understand the details as to how dlopen works in such case?

zl-wang commented 3 months ago

@ChengJin01 this has been always like that on AIX. when there are multiple members in an archive, dlopen needs to name the specific member you want to load. I am going to look up man-page of dlopen and attach here later.

zl-wang commented 3 months ago

dlopen Subroutine

Last Updated: 2023-03-24

Purpose

Dynamically loads a module into the calling process.

Syntax

#include <dlfcn.h>
void *dlopen (FilePath, Flags);
const char *FilePath;
int Flags;

Description

The dlopen subroutine loads the module specified by FilePath into the executing process's address space. Dependents of the module are automatically loaded as well. If the module is already loaded, it is not loaded again, but a new, unique value will be returned by the dlopen subroutine.

The dlopen subroutine is a portable way of dynamically loading shared libraries. It performs C++ static initialization of the modules that it loads, like the loadAndInit subroutine does.

The value returned by the dlopen might be used in subsequent calls to dlsym and dlclose. If an error occurs during the operation, dlopen returns NULL.

If the main application was linked with the -brtl option, then the runtime linker is invoked by dlopen. If the module being loaded was linked with runtime linking enabled, both intra-module and inter-module references are overridden by any symbols available in the main application. If runtime linking was enabled, but the module was not built enabled, then all inter-module references will be overridden, but some intra-module references will not be overridden.

If the module being opened with dlopen or any of its dependents is being loaded for the first time, initialization routines for these newly-loaded routines are called (after runtime linking, if applicable) before dlopen returns. Initialization routines are the functions specified with the -binitfini: linker option when the module was built. (See the ld command for more information about this option.)

After calling the initialization functions for all newly-loaded modules, C++ static initialization is performed. If you call the dlopen subroutine from within an initialization function or a C++ static initialization function, modules loaded by the nested dlopen subroutine might be initialized before completely initializing the originally loaded modules.

If a dlopen subroutine is called from within a binitfini function, the initialization of the current module is abandoned for other modules.

Note: If the module being loaded has read-other permission, the module is loaded into the global shared library segment. Modules loaded into the global shared library segment are not unloaded even if they are no longer being used. Use the slibclean command to remove unused modules from the global shared library segment. To load the module in the process private region, unload the module completely using the slibclean command, and then unset its read-other permission.

The LIBPATH or LD_LIBRARY_PATH environment variables can be used to specify a list of directories in which the dlopen subroutine searches for the named module. The running application also contains a set of library search paths that were specified when the application was linked. The dlopen subroutine searches the modules based on the mechanism that the load subroutine defines, because the dlopen subroutine internally calls the load subroutine with the L_LIBPATH_EXEC flag.

Item | Description -- | -- FilePath | Specifies the name of a file containing the loadable module. This parameter can be contain an absolute path, a relative path, or no path component. If FilePath contains a slash character, FilePath is used directly, and no directories are searched.If the FilePath parameter is /unix, dlopen returns a value that can be used to look up symbols in the current kernel image, including those symbols found in any kernel extension that was available at the time the process began execution.If the value of FilePath is NULL, a value for the main application is returned. This allows dynamically loaded objects to look up symbols in the main executable, or for an application to examine symbols available within itself.

Return Values

Upon successful completion, dlopen returns a value that can be used in calls to the dlsym and dlclose subroutines. The value is not valid for use with the loadbind and unload subroutines.

If the dlopen call fails, NULL (a value of 0) is returned and the global variable errno is set. If errno contains the value ENOEXEC, further information is available via the dlerror function.

dlopen Subroutine Last Updated: 2023-03-24 Purpose Dynamically loads a module into the calling process. Syntax #include void *dlopen (FilePath, Flags); const char *FilePath; int Flags; Description The dlopen subroutine loads the module specified by FilePath into the executing process's address space. Dependents of the module are automatically loaded as well. If the module is already loaded, it is not loaded again, but a new, unique value will be returned by the dlopen subroutine. The dlopen subroutine is a portable way of dynamically loading shared libraries. It performs C++ static initialization of the modules that it loads, like the [loadAndInit](https://www.ibm.com/docs/en/ssw_aix_73/l_bostechref/load.html) subroutine does. The value returned by the dlopen might be used in subsequent calls to dlsym and dlclose. If an error occurs during the operation, dlopen returns NULL. If the main application was linked with the -brtl option, then the runtime linker is invoked by dlopen. If the module being loaded was linked with runtime linking enabled, both intra-module and inter-module references are overridden by any symbols available in the main application. If runtime linking was enabled, but the module was not built enabled, then all inter-module references will be overridden, but some intra-module references will not be overridden. If the module being opened with dlopen or any of its dependents is being loaded for the first time, initialization routines for these newly-loaded routines are called (after runtime linking, if applicable) before dlopen returns. Initialization routines are the functions specified with the -binitfini: linker option when the module was built. (See the [ld](https://www.ibm.com/docs/en/ssw_aix_73/l_commands/ld.html) command for more information about this option.) After calling the initialization functions for all newly-loaded modules, C++ static initialization is performed. If you call the dlopen subroutine from within an initialization function or a C++ static initialization function, modules loaded by the nested dlopen subroutine might be initialized before completely initializing the originally loaded modules. If a dlopen subroutine is called from within a binitfini function, the initialization of the current module is abandoned for other modules. Note: If the module being loaded has read-other permission, the module is loaded into the global shared library segment. Modules loaded into the global shared library segment are not unloaded even if they are no longer being used. Use the [slibclean](https://www.ibm.com/docs/en/ssw_aix_73/s_commands/slibclean.html) command to remove unused modules from the global shared library segment. To load the module in the process private region, unload the module completely using the slibclean command, and then unset its read-other permission. The LIBPATH or LD_LIBRARY_PATH environment variables can be used to specify a list of directories in which the dlopen subroutine searches for the named module. The running application also contains a set of library search paths that were specified when the application was linked. The dlopen subroutine searches the modules based on the mechanism that the load subroutine defines, because the dlopen subroutine internally calls the load subroutine with the L_LIBPATH_EXEC flag. Item Description FilePath Specifies the name of a file containing the loadable module. This parameter can be contain an absolute path, a relative path, or no path component. If FilePath contains a slash character, FilePath is used directly, and no directories are searched. If the FilePath parameter is /unix, dlopen returns a value that can be used to look up symbols in the current kernel image, including those symbols found in any kernel extension that was available at the time the process began execution. If the value of FilePath is NULL, a value for the main application is returned. This allows dynamically loaded objects to look up symbols in the main executable, or for an application to examine symbols available within itself. Flags Specifies variations of the behavior of dlopen. Either RTLD_NOW or RTLD_LAZY must always be specified. Other flags may be OR'ed with RTLD_NOW or RTLD_LAZY. Item Description RTLD_NOW Load all dependents of the module being loaded and resolve all symbols. RTLD_LAZY Specifies the same behavior as RTLD_NOW. In a future release of the operating system, the behavior of the RTLD_LAZY may change so that loading of dependent modules is deferred of resolution of some symbols is deferred. RTLD_GLOBAL Allows symbols in the module being loaded to be visible when resolving symbols used by other dlopen calls. These symbols will also be visible when the main application is opened with dlopen(NULL, mode). RTLD_LOCAL Prevent symbols in the module being loaded from being used when resolving symbols used by other dlopen calls. Symbols in the module being loaded can only be accessed by calling dlsym subroutine. If neither RTLD_GLOBAL nor RTLD_LOCAL is specified, the default is RTLD_LOCAL. If both flags are specified, RTLD_LOCAL is ignored. RTLD_MEMBER The dlopen subroutine can be used to load a module that is a member of an archive. The L_LOADMEMBER flag is used when the load subroutine is called. The module name FilePath names the archive and archive member according to the rules outlined in the load subroutine. RTLD_NOAUTODEFER Prevents deferred imports in the module being loaded from being automatically resolved by subsequent loads. The L_NOAUTODEFER flag is used when the load subroutine is called. Ordinarily, modules built for use by the dlopen and dlsym sub routines will not contain deferred imports. However, deferred imports can be still used. A module opened with dlopen may provide definitions for deferred imports in the main application, for modules loaded with the load subroutine (if the L_NOAUTODEFER flag was not used), and for other modules loaded with the dlopen subroutine (if the RTLD_NOAUTODEFER flag was not used). Return Values Upon successful completion, dlopen returns a value that can be used in calls to the dlsym and dlclose subroutines. The value is not valid for use with the loadbind and unload subroutines. If the dlopen call fails, NULL (a value of 0) is returned and the global variable errno is set. If errno contains the value ENOEXEC, further information is available via the dlerror function.
zl-wang commented 3 months ago

buried deep in the above in Flags section: RTLD_MEMBER The dlopen subroutine can be used to load a module that is a member of an archive. The L_LOADMEMBER flag is used when the load subroutine is called. The module name FilePath names the archive and archive member according to the rules outlined in the load subroutine.

ChengJin01 commented 3 months ago

The problem with libclang.a is that these symbols (required in jextract) don't belong to any member of the archive:

[304]   0x11041e898    .data      EXP     DS SECdef        [noIMid] clang_getRemappings
[305]   0x11041e8b0    .data      EXP     DS SECdef        [noIMid] clang_getRemappingsFr
[306]   0x11041e8c8    .data      EXP     DS SECdef        [noIMid] clang_remap_getNumFil
[307]   0x11041e8e0    .data      EXP     DS SECdef        [noIMid] clang_remap_getFilena
[308]   0x11041e8f8    .data      EXP     DS SECdef        [noIMid] clang_remap_dispose
[309]   0x11041e910    .data      EXP     DS SECdef        [noIMid] clang_getBuildSession
[310]   0x11041e928    .data      EXP     DS SECdef        [noIMid] clang_VirtualFileOver
[311]   0x11041e940    .data      EXP     DS SECdef        [noIMid] clang_VirtualFileOverMapping
[312]   0x11041e958    .data      EXP     DS SECdef        [noIMid] clang_VirtualFileOverSensitivity
[313]   0x11041e970    .data      EXP     DS SECdef        [noIMid] clang_VirtualFileOverBuffer
[314]   0x11041e988    .data      EXP     DS SECdef        [noIMid] clang_free
[315]   0x11041e9a0    .data      EXP     DS SECdef        [noIMid] clang_VirtualFileOver
[316]   0x11041e9b8    .data      EXP     DS SECdef        [noIMid] clang_ModuleMapDescri
[317]   0x11041e9d0    .data      EXP     DS SECdef        [noIMid] clang_ModuleMapDescrimeworkModuleName
[318]   0x11041e9e8    .data      EXP     DS SECdef        [noIMid] clang_ModuleMapDescrirellaHeader
[319]   0x11041ea00    .data      EXP     DS SECdef        [noIMid] clang_ModuleMapDescrioBuffer
[320]   0x11041ea18    .data      EXP     DS SECdef        [noIMid] clang_ModuleMapDescrie
[321]   0x110434a20    .data      EXP     DS SECdef        [noIMid] clang_Cursor_isNull
[322]   0x110434a38    .data      EXP     DS SECdef        [noIMid] clang_getNullRange
[323]   0x110434a50    .data      EXP     DS SECdef        [noIMid] clang_getNullLocation
[324]   0x110434a68    .data      EXP     DS SECdef        [noIMid] clang_getFileLocation
[325]   0x110434a80    .data      EXP     DS SECdef        [noIMid] clang_getCursorUSR
[326]   0x110434a98    .data      EXP     DS SECdef        [noIMid] clang_getCString
[327]   0x110434ab0    .data      EXP     DS SECdef        [noIMid] clang_disposeString
[328]   0x110434ac8    .data      EXP     DS SECdef        [noIMid] clang_getTypeDeclarat
[329]   0x110434af8    .data      EXP     DS SECdef        [noIMid] clang_getRangeStart
[330]   0x110434b10    .data      EXP     DS SECdef        [noIMid] clang_getRangeEnd
[331]   0x110434b28    .data      EXP     DS SECdef        [noIMid] clang_getRange
[332]   0x110434b70    .data      EXP     DS SECdef        [noIMid] clang_defaultDiagnosttions
[333]   0x110434b88    .data      EXP     DS SECdef        [noIMid] clang_formatDiagnosti
[334]   0x1104c32f0    .data      EXP     DS SECdef        [noIMid] clang_install_abortinl_error_handler
[335]   0x1104c38d8    .data      EXP     DS SECdef        [noIMid] clang_createTranslati
[336]   0x1104e2f10    .data      EXP     DS SECdef        [noIMid] clang_Cursor_getTrans
[337]   0x1104e2f28    .data      EXP     DS SECdef        [noIMid] clang_Range_isNull
[338]   0x1104e3048    .data      EXP     DS SECdef        [noIMid] clang_disposeTranslat
[339]   0x1104e3078    .data      EXP     DS SECdef        [noIMid] clang_isInvalid
[340]   0x1104e3090    .data      EXP     DS SECdef        [noIMid] clang_isDeclaration
[341]   0x1104e30a8    .data      EXP     DS SECdef        [noIMid] clang_isReference
[342]   0x1104e30c0    .data      EXP     DS SECdef        [noIMid] clang_isStatement
[343]   0x1104e30d8    .data      EXP     DS SECdef        [noIMid] clang_isExpression
[344]   0x1104e30f0    .data      EXP     DS SECdef        [noIMid] clang_isTranslationUn
[345]   0x1104e3108    .data      EXP     DS SECdef        [noIMid] clang_isAttribute
[346]   0x1104e3120    .data      EXP     DS SECdef        [noIMid] clang_createIndex
......

Does dlopen work to handle them correctly?

zl-wang commented 3 months ago

that means you might need another shared-lib to satisfy the request. i am wondering how the executable was linked first. why can it be linked successfully, if there were missing symbols.

ChengJin01 commented 3 months ago

this has been always like that on AIX. when there are multiple members in an archive, dlopen needs to name the specific member you want to load. I am going to look up man-page of dlopen and attach here later.

I tried the following code but it ended up with a null handle.

#include <stdio.h>
#include <dlfcn.h>

int main(int argc, char **argv) {
    void *handle;
    handle = dlopen ("/usr/lib/libc.a(shr_64.o)", RTLD_MEMBER); <--- or RTLD_MEMBER | RTLD_LAZY
    printf("handle = %p\n", handle);
    dlclose(handle);
    return 0;
}
ChengJin01 commented 3 months ago

that means you might need another shared-lib to satisfy the request. i am wondering how the executable was linked first. why can it be linked successfully, if there were missing symbols.

These libraries (including libclang.a) are directly unpacked from https://github.com/llvm/llvm-project/releases/download/llvmorg-18.1.8/clang+llvm-18.1.8-powerpc64-ibm-aix-7.2.tar.xz (as required by jextract) in which these libraries are put together there.

pshipton commented 2 months ago

Not specific to jdk23, not a new problem, not a blocker for jdk23, move it forward.

babsingh commented 1 month ago

@JasonFengJ9 For 0.48, this issue will need to be resolved by the end of this week. What's the current state of this issue? Based on this issue's impact, do we need it to be fixed in 0.48 or can it be pushed to 0.49?

zl-wang commented 1 month ago

i have successfully built and run jextract on AIX (and Linux) for customer (Finanz Informatik). on AIX, there is a clang bug though (allocating 2TB memory). for official build, you might need to change the gradle build script to copy/extract libclang.so from libclang.a.

the bug still exists in latest/current version of clang. OpenXL team is investigating, tracked here: https://github.ibm.com/compiler/wyvern/issues/20642

JasonFengJ9 commented 1 month ago

This is not a new problem. As per https://github.com/eclipse-openj9/openj9/issues/19930#issuecomment-2353882872, the customer has a running jextract.

Moving to 0.49.

JasonFengJ9 commented 1 month ago

i have successfully built and run jextract on AIX (and Linux) for customer (Finanz Informatik).

@zl-wang is there any OpenJ9 change involved? Do we still need this issue for further investigation?

zl-wang commented 1 month ago

no, i don't need to change anything in OpenJ9. If this issue was opened for the purpose of building jextract on AIX, i think it is better to be in jextract repository (in order to change the gradle script). otherwise, it can be closed.

JasonFengJ9 commented 1 month ago

Thanks @zl-wang this issue was opened for OpenJ9 support of AIX jextract, since it can be addressed with the build script changes, closing it here.

github-actions[bot] commented 1 month ago

Issue Number: 19930 Status: Closed Actual Components: comp:vm, project:panama, os:aix Actual Assignees: No one :( PR Assignees: No one :(

JasonFengJ9 commented 1 month ago

If this issue was opened for the purpose of building jextract on AIX, i think it is better to be in jextract repository (in order to change the gradle script).

Chatted with @zl-wang, the shared library .so was extracted from the archive file, and renamed to libclang.so manually. Will propose a script change at https://github.com/openjdk/jextract.