eclipse-ee4j / eclipselink

Eclipselink project
https://eclipse.dev/eclipselink/
Other
199 stars 169 forks source link

DeadLock new Pattern, Critical Deadlock Discovery Regarding EclipseLink Semaphore #2100

Open 99sono opened 7 months ago

99sono commented 7 months ago

Hi,

On March 15, 2024, we made a significant discovery related to a deadlock pattern. Specifically, we found that the eclipselink.concurrency.manager.object.building.semaphore property in the persistence.xml file is not entirely safe.

The purpose of this property is to mitigate the probability of deadlocks by imposing a strict limit on the number of threads allowed to access the concurrency manager for object building.

However, our recent investigation revealed an unexpected issue: Threads requesting access to the semaphore for object building might already hold cache keys. This scenario contradicts our initial expectation. Ideally, a thread engaged in object building and requesting the semaphore should not possess any read/write locks.

The data from the massive dump indicates that a specific thread, as shown in the stack trace, was denied access to the semaphore because ten other threads were already engaged in object building. Unfortunately, our thread lacking cache key access holds write lock cache keys that other threads require.

We intend to report this new deadlock pattern to Oracle via a service request promptly.

Stack Trace Pattern 01, ten threads doing object building and stuck because the cache keys the want to acquire for reading are already acquired for writting: [ACTIVE] ExecuteThread: '525' for queue: 'weblogic.kernel.Default (self-tuning)'"     java.lang.Thread.State: RUNNABLE         at java.management@11.0.16/sun.management.ThreadImpl.getThreadInfo1(Native Method)         at java.management@11.0.16/sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:197)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.enrichGenerateThreadDump(ConcurrencyUtil.java:939)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.createInformationThreadDump(ConcurrencyUtil.java:969)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationStep02(ConcurrencyUtil.java:570)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationStep01(ConcurrencyUtil.java:554)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationIfAppropriate(ConcurrencyUtil.java:477)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.determineIfReleaseDeferredLockAppearsToBeDeadLocked(ConcurrencyUtil.java:170)         at org.eclipse.persistence.internal.helper.ConcurrencyManager.acquireReadLock(ConcurrencyManager.java:333)         at org.eclipse.persistence.internal.identitymaps.CacheKey.acquireReadLock(CacheKey.java:284)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1059)         at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildWorkingCopyCloneNormally(ObjectBuilder.java:952

Stack Trace Pattern 02, The thread holding active write cache keys not being allowed the semaphore as already 10 threads are in doing object building.

"[ACTIVE] ExecuteThread: '207' for queue: 'weblogic.kernel.Default (self-tuning)'"     java.lang.Thread.State: TIMED_WAITING         at java.base@11.0.16/jdk.internal.misc.Unsafe.park(Native Method)         at java.base@11.0.16/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)         at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1079)         at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1369)         at java.base@11.0.16/java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:415)         at org.eclipse.persistence.internal.helper.ConcurrencySemaphore.acquireSemaphoreIfAppropriate(ConcurrencySemaphore.java:108)         at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:726)         at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:705)         at org.eclipse.persistence.queries.ObjectLevelReadQuery.buildObject(ObjectLevelReadQuery.java:861)         at org.eclipse.persistence.queries.ReadObjectQuery.registerResultInUnitOfWork(ReadObjectQuery.java:901)         at org.eclipse.persistence.queries.ReadObjectQuery.executeObjectLevelReadQuery(ReadObjectQuery.java:568)         at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1232)         at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:911)         at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1191)         at org.eclipse.persistence.queries.ReadObjectQuery.execute(ReadObjectQuery.java:447)         at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:1279)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:3004)         at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1898)         at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1880)         at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1830)         at org.eclipse.persistence.internal.jpa.EntityManagerImpl.executeQuery(EntityManagerImpl.java:1012)         at org.eclipse.persistence.internal.jpa.EntityManagerImpl.findInternal(EntityManagerImpl.java:954)         at org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:830)         at org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:696)         at jdk.internal.reflect.GeneratedMethodAccessor471.invoke(Unknown Source)         at java.base@11.0.16/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.base@11.0.16/java.lang.reflect.Method.invoke(Method.java:566)         at weblogic.persistence.BasePersistenceContextProxyImpl.invoke(BasePersistenceContextProxyImpl.java:97)         at weblogic.persistence.TransactionalEntityManagerProxyImpl.invoke(TransactionalEntityManagerProxyImpl.java:164)         at weblogic.persistence.BasePersistenceContextProxyImpl.invoke(BasePersistenceContextProxyImpl.java:86)         at com.sun.proxy.$Proxy603.find(Unknown Source)         at jdk.internal.reflect.GeneratedMethodAccessor465.invoke(Unknown Source)         at java.base@11.0.16/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.base@11.0.16/java.lang.reflect.Method.invoke(Method.java:566)

Thank you for your attention.

Best regards.

99sono commented 7 months ago

To safeguard confidential information, we will refrain from sharing specific data such as cache keys owned by thread 207 and the read lock cache keys sought by the ten threads engaged in object building. Instead, we will provide Oracle with the comprehensive massive dump report generated by the EclipseLink library.

The description outlined here should serve as a solid foundation for addressing the defect and devising an appropriate solution.

Thank you for your attention.

Best regards.

99sono commented 7 months ago

One piece of information that is missing here, the dead lock was experience using eclipselink 2.7.6 of weblogic 14 with additional oracle patches. In short, the eclipselink version is very similar to that of the 2.7.9 tag release.

Thanks.

99sono commented 6 months ago

Additional Insight: We acknowledge that the deadlock pattern in question is exceedingly rare. To date, it has only manifested in one production instance. Typically, other production instances operate with the object-building semaphore limit enabled and have not exhibited this pattern.

Nonetheless, the evidence at hand conclusively demonstrates that semaphores can indeed contribute to deadlock scenarios. This occurs when a thread, denied entry by the semaphore, retains ownership of cache key resources. Despite the infrequency of such occurrences, the existence of concrete evidence cannot be ignored. It reveals that acquiring the object-building semaphore is akin to wielding a double-edged sword: while it offers benefits, it also harbors the potential for deadlocks. Ironically, the semaphore's original intent was to diminish the likelihood of deadlocks.