eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 720 forks source link

NCDFE in Liberty startup #20444

Open hangshao0 opened 5 days ago

hangshao0 commented 5 days ago

Java -version output

1CIJAVAVERSION JRE 17 Windows 10 amd64-64 (build 17.0.12+7)
1CIVMVERSION   20240716_793
1CIJ9VMTAG     openj9-0.46.1
1CIJ9VMVERSION 4760d5d320
1CIJITVERSION  j9jit_20240910_1925_jenkins
1CIOMRVERSION  840a9adba_CMPRSS
1CIJCLVERSION  e70fd1fd64a based on jdk-17.0.12+7

Summary of problem

java.lang.NoClassDefFoundError thrown when loading class com.ibm.ws.classloading.internal.providers.Providers

Diagnostic files

java.lang.NoClassDefFoundError: com.ibm.ws.classloading.internal.providers.Providers
        at com.ibm.ws.classloading.internal.AppClassLoader.<init>(AppClassLoader.java:172)
        at com.ibm.ws.classloading.internal.ClassLoaderFactory.createClassLoader(ClassLoaderFactory.java:138)
        at com.ibm.ws.classloading.internal.ClassLoaderFactory.create(ClassLoaderFactory.java:116)
        at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createAppClassLoader(ClassLoadingServiceImpl.java:349)
        at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createTopLevelClassLoader(ClassLoadingServiceImpl.java:321)
        at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createTopLevelClassLoader(ClassLoadingServiceImpl.java:103)
        at com.ibm.ws.classloading.ClassLoaderConfigHelper.createTopLevelClassLoader(ClassLoaderConfigHelper.java:314)
        at com.ibm.ws.app.manager.module.internal.DeployedAppInfoBase.createTopLevelClassLoader(DeployedAppInfoBase.java:635)
        at com.ibm.ws.app.manager.ear.internal.EARDeployedAppInfo.basicCreateAppClassLoader(EARDeployedAppInfo.java:231)
        at com.ibm.ws.app.manager.ear.internal.EARDeployedAppInfo.getAppClassLoader(EARDeployedAppInfo.java:180)
        at com.ibm.ws.app.manager.ear.internal.EARDeployedAppInfo.createEJBModuleClassLoader(EARDeployedAppInfo.java:1364)
        at com.ibm.ws.app.manager.ear.internal.EARDeployedAppInfo.createModuleClassLoader(EARDeployedAppInfo.java:1345)
        at com.ibm.ws.app.manager.module.internal.ExtendedModuleInfoImpl.getClassLoader(ExtendedModuleInfoImpl.java:124)
        at com.ibm.ws.ejbcontainer.osgi.internal.ModuleInitDataAdapter.createModuleInitData(ModuleInitDataAdapter.java:178)
        at com.ibm.ws.ejbcontainer.osgi.internal.ModuleInitDataAdapter.adapt(ModuleInitDataAdapter.java:124)
        at com.ibm.ws.ejbcontainer.osgi.internal.ModuleInitDataAdapter.adapt(ModuleInitDataAdapter.java:43)
        at com.ibm.ws.adaptable.module.internal.AdapterFactoryServiceImpl.adapt(AdapterFactoryServiceImpl.java:202)
        at com.ibm.ws.adaptable.module.internal.AdaptableContainerImpl.adapt(AdaptableContainerImpl.java:176)
        at com.ibm.ws.adaptable.module.internal.InterpretedContainerImpl.adapt(InterpretedContainerImpl.java:205)
        at com.ibm.ws.ejbcontainer.osgi.internal.EJBModuleRuntimeContainerImpl.createModuleMetaData(EJBModuleRuntimeContainerImpl.java:105)
        at com.ibm.ws.app.manager.module.internal.ModuleHandlerBase.createModuleMetaData(ModuleHandlerBase.java:65)
        at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase$ModuleContainerInfoBase.createModuleMetaData(SimpleDeployedAppInfoBase.java:222)
        at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase.preDeployApp(SimpleDeployedAppInfoBase.java:608)
        at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase.installApp(SimpleDeployedAppInfoBase.java:584)
        at com.ibm.ws.app.manager.module.internal.DeployedAppInfoBase.deployApp(DeployedAppInfoBase.java:625)
        at com.ibm.ws.app.manager.ear.internal.EARApplicationHandlerImpl.install(EARApplicationHandlerImpl.java:79)
        at com.ibm.ws.app.manager.internal.statemachine.StartAction.execute(StartAction.java:199)
        at com.ibm.ws.app.manager.internal.statemachine.ApplicationStateMachineImpl.enterState(ApplicationStateMachineImpl.java:1369)
        at com.ibm.ws.app.manager.internal.statemachine.ApplicationStateMachineImpl.run(ApplicationStateMachineImpl.java:912)
        at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:298)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:857)

Diagnostic files here captured by option: -Xdump:system+java+snap:events=throw+systhrow,filter=java/lang/NoClassDefFoundError,msg_filter=com.ibm.ws.classloading.internal.providers.Providers,request=exclusive+prepwalk+serial+preempt

github-actions[bot] commented 5 days ago

Issue Number: 20444 Status: Open Recommended Components: comp:vm, comp:jclextensions, comp:gc Recommended Assignees: jasonfengj9, keithc-ca, hangshao0

hangshao0 commented 4 days ago

From the tracepoint:

11:56:46.063008409  0x0000000000d23600 j9vm.315            Entry      >0Xd23600 loader 0x1ea71d24b40 class java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock arbitratedLoadClass enter className
11:56:46.063030006  0x0000000000d23600 j9vm.119            Event       send loadClass(java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock), stringObject: 0xff802740 loader: 0x816c7ec8
11:56:46.063063703 *0x0000000000c64900 j9vm.315            Entry      >0Xc64900 loader 0x1ea71d24b40 class com/ibm/ws/classloading/internal/providers/Providers arbitratedLoadClass enter className
11:56:46.063073692 *0x0000000000d23600 j9vm.120            Event       sent loadClass(java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock) --> got 0x80054208
11:56:46.063075972 *0x0000000000c64900 j9vm.119            Event       send loadClass(com/ibm/ws/classloading/internal/providers/Providers), stringObject: 0xff81aa58 loader: 0x816c7ec8
11:56:46.063091050  0x0000000000c64900 j9vm.120            Event       sent loadClass(com/ibm/ws/classloading/internal/providers/Providers) --> got 0x80054208
11:56:46.063093870 *0x0000000000d23600 j9vm.319            Exit       <0Xd23600 loader 0x1ea71d24b40 class java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock arbitratedLoadClass exit foundClass 0x1aa200
11:56:46.063122777 *0x0000000000c64900 j9vm.319            Exit       <0Xc64900 loader 0x1ea71d24b40 class com/ibm/ws/classloading/internal/providers/Providers arbitratedLoadClass exit foundClass 0x0                     <----- foundClass is 0x0 which causes the NoClassDefFoundError
11:56:46.063142825  0x0000000000c64900 j9vm.294            Entry      >setCurrentException index=21 constructorIndex=0 detailMessage=0xff81ab78

There are 2 threads loading classes: Thread 0xd23600 is loading ReentrantReadWriteLock$WriteLock Thread 0xc64900 is loading Providers

However, both threads got 0x80054208, which is ReentrantReadWriteLock$WriteLock. This is correct for thread 0xd23600, but incorrect for Thread 0xc64900.

11:56:46.063073692 *0x0000000000d23600 j9vm.120            Event       sent loadClass(java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock) --> got 0x80054208
11:56:46.063091050  0x0000000000c64900 j9vm.120            Event       sent loadClass(com/ibm/ws/classloading/internal/providers/Providers) --> got 0x80054208

The tracepoint is: https://github.com/eclipse-openj9/openj9/blob/a08b5362efa0bbf88a0298813240ff4723f405bf/runtime/vm/classsupport.c#L778

and sendLoadClassResult is from: https://github.com/eclipse-openj9/openj9/blob/a08b5362efa0bbf88a0298813240ff4723f405bf/runtime/vm/classsupport.c#L742

Could you look at why thread 0xc64900 get the same returnValue as thread 0xd23600 ? @thallium

tjwatson commented 4 days ago

We have also observed this strange NoSuchMethodError where it seems to say it is looking for the getPrivateLibraries method on ReentrantReadWriteLock$WriteLock when that method is on the Providers class. That sounds like it could be another flavor of this issue where the thread got the wrong class?

java.lang.NoSuchMethodError: java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock.getPrivateLibraries(Lcom/ibm/wsspi/classloading/ClassLoaderConfiguration;)Ljava/util/List; (loaded from jrt:/java.base by <Bootstrap Loader>) called from class com.ibm.ws.classloading.internal.AppClassLoader (loaded from file:/C:/Users/jazz_build/Build/jazz-build-engines/wasrtc/EBCPROD/build/dev/image/output/wlp/lib/com.ibm.ws.classloading_1.1.95.jar by org.eclipse.osgi.internal.loader.EquinoxClassLoader@b19957cc[com.ibm.ws.classloading:1.1.95.202410071314(id=48)]).
    at com.ibm.ws.classloading.internal.AppClassLoader.<init>(AppClassLoader.java:172)
    at com.ibm.ws.classloading.internal.ClassLoaderFactory.createClassLoader(ClassLoaderFactory.java:138)
    at com.ibm.ws.classloading.internal.ClassLoaderFactory.create(ClassLoaderFactory.java:116)
    at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createAppClassLoader(ClassLoadingServiceImpl.java:349)
    at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createTopLevelClassLoader(ClassLoadingServiceImpl.java:321)
    at com.ibm.ws.classloading.internal.ClassLoadingServiceImpl.createTopLevelClassLoader(ClassLoadingServiceImpl.java:103)
    at com.ibm.ws.classloading.ClassLoaderConfigHelper.createTopLevelClassLoader(ClassLoaderConfigHelper.java:314)
    at com.ibm.ws.app.manager.module.internal.DeployedAppInfoBase.createTopLevelClassLoader(DeployedAppInfoBase.java:635)
    at com.ibm.ws.app.manager.war.internal.WARDeployedAppInfo.createModuleClassLoader(WARDeployedAppInfo.java:105)
    at com.ibm.ws.app.manager.module.internal.ExtendedModuleInfoImpl.getClassLoader(ExtendedModuleInfoImpl.java:124)
    at com.ibm.ws.container.service.annocache.internal.ModuleAnnotationsImpl.<init>(ModuleAnnotationsImpl.java:198)
    at com.ibm.ws.container.service.annocache.internal.ModuleAnnotationsImpl.<init>(ModuleAnnotationsImpl.java:161)
    at com.ibm.ws.container.service.annocache.internal.WebAnnotationsImpl.<init>(WebAnnotationsImpl.java:91)
    at com.ibm.ws.container.service.annocache.internal.WebAnnotationsAdapterImpl.adapt(WebAnnotationsAdapterImpl.java:63)
    at com.ibm.ws.container.service.annocache.internal.WebAnnotationsAdapterImpl.adapt(WebAnnotationsAdapterImpl.java:35)
    at com.ibm.ws.adaptable.module.internal.AdapterFactoryServiceImpl.adapt(AdapterFactoryServiceImpl.java:202)
    at com.ibm.ws.adaptable.module.internal.AdaptableContainerImpl.adapt(AdaptableContainerImpl.java:176)
    at com.ibm.ws.container.service.annocache.AnnotationsBetaHelper.getWebAnnotations(AnnotationsBetaHelper.java:270)
    at com.ibm.ws.webcontainer.osgi.container.config.WebAppConfigurator.setWebAnnotations(WebAppConfigurator.java:88)
    at com.ibm.ws.webcontainer.osgi.container.config.WebAppConfigurator.configure(WebAppConfigurator.java:424)
    at com.ibm.ws.webcontainer.osgi.container.config.WebAppConfigurationAdapter.adapt(WebAppConfigurationAdapter.java:121)
    at com.ibm.ws.webcontainer.osgi.container.config.WebAppConfigurationAdapter.adapt(WebAppConfigurationAdapter.java:45)
    at com.ibm.ws.adaptable.module.internal.AdapterFactoryServiceImpl.adapt(AdapterFactoryServiceImpl.java:202)
    at com.ibm.ws.adaptable.module.internal.AdaptableContainerImpl.adapt(AdaptableContainerImpl.java:176)
    at com.ibm.ws.webcontainer.osgi.WebContainer.createModuleMetaData(WebContainer.java:872)
    at com.ibm.ws.app.manager.module.internal.ModuleHandlerBase.createModuleMetaData(ModuleHandlerBase.java:65)
    at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase$ModuleContainerInfoBase.createModuleMetaData(SimpleDeployedAppInfoBase.java:222)
    at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase.preDeployApp(SimpleDeployedAppInfoBase.java:608)
    at com.ibm.ws.app.manager.module.internal.SimpleDeployedAppInfoBase.installApp(SimpleDeployedAppInfoBase.java:584)
    at com.ibm.ws.app.manager.module.internal.DeployedAppInfoBase.deployApp(DeployedAppInfoBase.java:625)
    at com.ibm.ws.app.manager.war.internal.WARApplicationHandlerImpl.install(WARApplicationHandlerImpl.java:67)
    at com.ibm.ws.app.manager.internal.statemachine.StartAction.execute(StartAction.java:199)
    at com.ibm.ws.app.manager.internal.statemachine.ApplicationStateMachineImpl.enterState(ApplicationStateMachineImpl.java:1369)
    at com.ibm.ws.app.manager.internal.statemachine.ApplicationStateMachineImpl.run(ApplicationStateMachineImpl.java:912)
    at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:298)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:857)
hangshao0 commented 4 days ago

@TobiAjila @gacholio Does this ring any bell to you ?

2 threads loading 2 different classes simultaneously. Thread1 loaded the correct class, but thread2 got the class loaded by thread1.

tjwatson commented 4 days ago

For Liberty we have three levels of class loaders

  1. JVM loaders (boot, app class path etc)
  2. OSGi (Equinox) Bundle class loaders
  3. Liberty application class loaders

All three levels are storing classes into the Open J9 shared class cache. So far I only recall seeing these strange errors for the classes loaded by the Equinox bundle class loaders. This makes me suspect there is something specific to how Equinox uses the SharedClassURLHelper that is causing something to go wrong. There we store the classes with this call:

https://github.com/eclipse-equinox/equinox/blob/9ee49aa9caa8670b7d6af6fad7d65f7ea387c5fb/bundles/org.eclipse.osgi/container/src/org/eclipse/osgi/internal/cds/CDSHookImpls.java#L145

The call to com.ibm.oti.shared.SharedClassURLHelper.storeSharedClass(String, URL, Class) is done outside of any locks that would be held by the class loader for the class name being stored. But there proper locks while defining the class such that two threads do not try to call ClassLoader.defineClass multiple times for the same class. And only the thread that "won" in defining the class should be the thread that attempts to store the class after it has been defined.

Similarly the method com.ibm.oti.shared.SharedClassURLHelper.findSharedClass(String, URL, String) is used to find a shared class with this call:

https://github.com/eclipse-equinox/equinox/blob/9ee49aa9caa8670b7d6af6fad7d65f7ea387c5fb/bundles/org.eclipse.osgi/container/src/org/eclipse/osgi/internal/cds/CDSBundleFile.java#L155

Again, this call has no locks held by the class loader while calling the SharedClassURLHelper.findSharedClass. Is it possible there is a threading issue here with either looking up or storing shared classes?

hangshao0 commented 4 days ago

re comment

There is only one copy of com/ibm/ws/classloading/internal/providers/Providers in the shared cache. It was added a previous JVM, not by the current JVM, which means this JVM did not call storeSharedClass() on this class. I have dumped the com/ibm/ws/classloading/internal/providers/Providers romClass and compared it against its java code, it looks correct to me.

Also there is only one copy of java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock. It is loaded by bootstrap class loader and it looks correct to me.

I believe this JVM called findSharedClass() on com/ibm/ws/classloading/internal/providers/Providers, however it is a hashtable look up where the class name is the key. I don't see how findSharedClass(String className1) could return a class whose name is not className1.

hangshao0 commented 4 days ago

I believe this JVM called findSharedClass() on com/ibm/ws/classloading/internal/providers/Providers

Actually findSharedClass() is not called on com/ibm/ws/classloading/internal/providers/Providers by this JVM. There should be JCL: SharedClassURLHelperImpl findSharedClassImpl entry and exit trace points in https://github.com/eclipse-openj9/openj9/issues/20444#issuecomment-2448263444 if it is called (like below), but I don't see them for com/ibm/ws/classloading/internal/providers/Providers. So neither storeSharedClass() nor findSharedClass() is involved here.


11:56:46.062322947  0x0000000000c64900 j9vm.119            Event       send loadClass(com/ibm/ws/classloading/internal/ContainerClassLoader$SmartClassPathImpl$1), stringObject: 0xff818ec8 loader: 0x816c7ec8
11:56:46.062344575  0x0000000000c64900 j9jcl.108           Entry      >JCL: SharedClassURLHelperImpl findSharedClassImpl: Entering for helperID 49
11:56:46.062358974  0x0000000000c64900 j9jcl.539           Entry      >JCL: com.ibm.oti.shared getCachedURL: Entering for helperID=49
11:56:46.062368133  0x0000000000c64900 j9jcl.154           Exit       <JCL: com.ibm.oti.shared getCachedURL: Exiting with found cached URL
11:56:46.062392360  0x0000000000c64900 j9shr.1293          Event       RMI locateROMClass: Locate succeeded. Returning 0x1ea6ff2ca04. FoundAtIndex=0. StaleCPEI=0x0. Locate request for ROMClass com/ibm/ws/classloading/internal/ContainerClassLoader$SmartClassPathImpl$1 from helper ID 49 with cpeIndex -1.
11:56:46.062402089  0x0000000000c64900 j9shr.1297          Event       CM findROMClass: class com/ibm/ws/classloading/internal/ContainerClassLoader$SmartClassPathImpl$1 found at address 0x1ea6d70b0f8 with classpath index 0. The classpath id is 49.
11:56:46.062418908  0x0000000000c64900 j9jcl.110           Exit       <JCL: SharedClassURLHelperImpl findSharedClassImpl: Exiting with TRUE
11:56:46.062545145  0x0000000000c64900 j9vm.1              Entry      >Create RAM class from ROM class 0x1ea6d70b0f8 in class loader 0x1ea71d24b40
11:56:46.062553204  0x0000000000c64900 j9vm.80             Event       ROM class 0x1ea6d70b0f8 is named com/ibm/ws/classloading/internal/ContainerClassLoader$SmartClassPathImpl$1
11:56:46.062564743  0x0000000000c64900 j9vm.88             Event       Loaded implemented interface java/lang/Runnable (RAM class=0x65100)
11:56:46.062659994  0x0000000000c64900 j9vm.2              Exit       <Created RAM class 0xd11100 from ROM class 0x1ea6d70b0f8
11:56:46.062692960  0x0000000000c64900 j9vm.120            Event       sent loadClass(com/ibm/ws/classloading/internal/ContainerClassLoader$SmartClassPathImpl$1) --> got 0x8149e558

The normal sequence should be the class loader calls findSharedClass() to see if the class is in shared cache, if not, then load the class from disk. However, in this case the wrong class is returned even before findSharedClass() is called. I don't see the JVM creating the wrong class indicated by trace j9vm.1, j9vm2 for com/ibm/ws/classloading/internal/providers/Providers either. I guess the wrong class could be from the cached data structure mapping between the class name and the actual class/class bytes in the EquinoxClassLoader.

tjwatson commented 4 days ago

I guess the wrong class could be from the cached data structure mapping between the class name and the actual class/class bytes in the EquinoxClassLoader.

Because Equinox class loaders are a graph that may contain cycles it does not hold locks for an initial call to java.lang.ClassLoader.findLoadedClass(String). Here is the sequence the Equinox loader uses:

  1. Without holding any lock check ClassLoader.findLoadedClass(String) if that returns non-null then the class is returned; otherwise continue.
  2. Without holding any lock check SharedClassURLHelper.findSharedClass if that returns non-null then continue to step 3; otherwise the class has not been stored yet, so we do normal byte code search.
  3. Obtain a class-loader/class-name pair for the class about to be defined with the shared class byte[].
  4. Call ClassLoader.findLoadedClass(String) again to make sure another thread didn't define the class; if that returns null then call ClassLoader.defineClass with the shared class byte[].
  5. release the class-loader/class-name pair for the class that was defined.
  6. Return the shared Class object

Could there be some inconsistency with allowing another thread to get a non-null result from the call to ClassLoader.findLoadedClass(String) in step 1 while we are trying to define the shared class in step 4? For performance reasons, I would like to avoid having to get the class-loader/class-name pair lock to call that first findLoadedClass.

hangshao0 commented 3 days ago

I guess the wrong class could be from the cached data structure mapping between the class name and the actual class/class bytes in the EquinoxClassLoader.

I realized the wrong class returned is java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock, which is a bootstrap class. It is not a class that has just been loaded by thread 0xd23600 . It is a class that has already been loaded by bootstrap class loader before. It is very likely both threads get the class from bootstraploader.findLoadedClass(String classname) through delegation all the way upto the bootstrap class loader. It is just thread 0xd23600 is actually looking for java/util/concurrent/locks/ReentrantReadWriteLock$WriteLock and but thread 0xc64900 is looking for a different class. The class name might becomes wrong somewhere in between.

Could there be some inconsistency with allowing another thread to get a non-null result from the call to ClassLoader.findLoadedClass(String) in step 1 while we are trying to define the shared class in step 4? For performance reasons, I would like to avoid having to get the class-loader/class-name pair lock to call that first findLoadedClass.

It is possible that a thread in step 1 can get a not null class while another thread is defining the class in step 4 (the thread should already be near the end of step 4 after the class is already added to the hashtable keyed on the class name, the class name in step 1 and 4 should be the same).

tjwatson commented 3 days ago

The class name might becomes wrong somewhere in between.

I'm out of ideas on the Equinox side how that could happen. I assume that must be going wrong down in the JVM?

hangshao0 commented 3 days ago

I'm out of ideas on the Equinox side how that could happen. I assume that must be going wrong down in the JVM?

This is a really strange issue. By looking at the code, I don't see how the JVM could be wrong about the class name either. Not sure if things go wrong inside JVM or before calling into JVM. But we can add assertions in the JVM code that the class being searched for and the class returned has the same name.

hangshao0 commented 3 days ago

@thallium is looking at adding the assertions.