Open 99sono opened 4 months ago
This issue is now documented under a service request. Thanks a lot.
Side note: The 2.7.15 tag appears to address a defect related to the ConcurrencyManager
. You can find the relevant changes in the following commits:
Additionally, here's the direct link to the specific commit: Commit details
I am some hard time following the changes on the method. the changes look bigger than i anticipated. I see that the method continues to do a:
putThreadAsWaitingToAcquireLockForWriting
Whereas I would have expected a
lastCacheKeyWeNeededToWaitToAcquire.putCurrentThreadAsWaitingToAcquireLockForReading
As I had explained in point 7.
Also the new piece of code doing:
toWaitOnLock.lock()
And later
toWaitOnLock.unlock();
Looks like a radical change to me in the method implementation.
I cannot judge if that code is actually correct or not.
All I can see I was expecting the method implementation to look more like this:
public Map acquireLocksForClone(Object objectForClone, ClassDescriptor descriptor, CacheKey cacheKey, AbstractSession cloningSession) {
// TRK-19750 - basic variable initialization to be able to do the
// determineIfReleaseDeferredLockAppearsToBeDeadLocked
final Date whileStartDate = new Date();
final Thread currentThread = Thread.currentThread();
DeferredLockManager lockManager = ConcurrencyManager.getDeferredLockManager(currentThread);
HackingEclipseReadLockManager readLockManager = ConcurrencyManager.getReadLockManager(currentThread);
boolean successful = false;
IdentityHashMap lockedObjects = new IdentityHashMap();
IdentityHashMap refreshedObjects = new IdentityHashMap();
CacheKey lastCacheKeyWeNeededToWaitToAcquire = null;
try {
// if the descriptor has indirection for all mappings then wait as there will be no deadlock risks
CacheKey toWaitOn = acquireLockAndRelatedLocks(objectForClone, lockedObjects, refreshedObjects, cacheKey, descriptor, cloningSession);
int tries = 0;
while (toWaitOn != null) {// loop until we've tried too many times.
for (Iterator lockedList = lockedObjects.values().iterator(); lockedList.hasNext();) {
((CacheKey)lockedList.next()).releaseReadLock();
lockedList.remove();
}
// TRK-19750 - populate the static hash map
// of the concurrenyc manager that we use for creating the massive log dump
// to indicate that the current thread is now stuck trying to acquire some arbitrary
// cache key for writing
lastCacheKeyWeNeededToWaitToAcquire = toWaitOn;
lastCacheKeyWeNeededToWaitToAcquire.putCurrentThreadAsWaitingToAcquireLockForReading(
"org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(Object, ClassDescriptor, CacheKey, AbstractSession)");
// TRK-19750 - Since we know this one of those methods that can appear in the dead locks
// we threads frozen here forever inside of the wait that used to have no timeout
// we will now always check for how long the current thread is stuck in this while loop going nowhere
// using the exact same approach we have been adding to the concurrency manager
HackingEclipseHelperUtil.SINGLETON.determineIfReleaseDeferredLockAppearsToBeDeadLocked(toWaitOn,
whileStartDate, lockManager, readLockManager,
WRITE_LOCK_MANAGER_IS_WILLING_TO_ALLOW_INTERRUPTED_EXCEPTION_TO_BE_FIRED_UP_IF_CONFIGURATION_WOULD_ALLOW_ID_TRUE);
synchronized (toWaitOn) {
try {
if (toWaitOn.isAcquired()) {//last minute check to insure it is still locked.
// TRK-19750
// this is the famous WriteLockManager.acquireLocksForClone(WriteLockManager.java:92)
// being one of the spots where threads trying build objects can get stuck forever
// commenting out wait without a timeout
// if the thread that has the lock with write permissions is in a dead lock
// then we are not coming out
// toWaitOn.wait();// wait for lock on object to be released
// TRK-19750
// at wait with timout like everywhere else waiting without timeout is always wrong
toWaitOn.wait(10000l);
}
} catch (InterruptedException ex) {
// Ignore exception thread should continue.
}
}
Object waitObject = toWaitOn.getObject();
// Object may be null for loss of identity.
if (waitObject != null) {
cloningSession.checkAndRefreshInvalidObject(waitObject, toWaitOn, cloningSession.getDescriptor(waitObject));
refreshedObjects.put(waitObject, waitObject);
}
toWaitOn = acquireLockAndRelatedLocks(objectForClone, lockedObjects, refreshedObjects, cacheKey, descriptor, cloningSession);
if ((toWaitOn != null) && ((++tries) > MAXTRIES)) {
// If we've tried too many times abort.
throw ConcurrencyException.maxTriesLockOnCloneExceded(objectForClone);
}
}
successful = true;//successfully acquired all locks
} catch (InterruptedException exception) {
// TRK-19750 - if determineIfReleaseDeferredLockAppearsToBeDeadLocked is blowing up a thread stuck for too
// long
// run the lock of freeing up locks acquired by the thread
// NOTE: we would be tempted to do this commented code bellow
// cacheKey.releaseAllLocksAquiredByThread(lockManager, readLockManager);
// throw ConcurrencyException.waitFailureOnClientSession(exception);
// Instead what we do is just mimic the vanila behavior we have for the interrupted exception inside of the
// wait
// we must assume this is correct behavior
throw ConcurrencyException.maxTriesLockOnCloneExceded(objectForClone);
} finally {
// TRK-19750 - remove from the static hash map
// of the concurrency manager that we use for creating the massive log dump
// any information we may have added that this thread was strugling to acquire any particular
// cache key. The current thread is out of the wait to acquire logic now so we can consider the thread
// as not no longer being stuck
if (lastCacheKeyWeNeededToWaitToAcquire != null) {
lastCacheKeyWeNeededToWaitToAcquire.removeCurrentThreadNoLongerWaitingToAcquireLockForReading();
}
if (!successful) {//did not acquire locks but we are exiting
for (Iterator lockedList = lockedObjects.values().iterator(); lockedList.hasNext();) {
((CacheKey)lockedList.next()).releaseReadLock();
lockedList.remove();
}
}
}
return lockedObjects;
}
I would assume the changes in the: https://github.com/eclipse-ee4j/eclipselink/blob/2.7.15/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/helper/WriteLockManager.java
Are correct but I have some hard time seeing that is the case.
Sorry but is it this issue still valid? Because methods
public void putThreadAsWaitingToAcquireLockForWriting(Thread thread, String methodName)
https://github.com/eclipse-ee4j/eclipselink/blob/2.7/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/helper/ConcurrencyManager.java#L915
and
public void removeThreadNoLongerWaitingToAcquireLockForWriting(Thread thread)
https://github.com/eclipse-ee4j/eclipselink/blob/2.7/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/helper/ConcurrencyManager.java#L924
looks OK to me.
Bug Report: Issue with
putThreadAsWaitingToAcquireLockForWriting
Method in EclipseLink 2.7.xProblem Description:
putThreadAsWaitingToAcquireLockForWriting
method in EclipseLink 2.7.x is incorrect.Affected Method:
org.eclipse.persistence.internal.helper.ConcurrencyManager.putThreadAsWaitingToAcquireLockForWriting(Thread, String)
Notice how this method is putting the trace metadata and then immediately removing it right after.
This how this method is looking like in the inhoud modified 2.6.4 version.
Missing Removal Method:
Efficiency Concerns:
org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone
) to construct the trace string is inefficient.stackTraceElement.getClassName() + "." + stackTraceElement.getMethodName() + "(...)"
should either be cached in a static variable or computed only once.Here is a preview of what the method currently looks like:
How We spotted this bug:
We are currently analyzing a new massive dump. This is a completely new pattern of deadlock we are investigating. Different from the MergeManager deadlock for which we have a different open issue. The current reason for the deadlock is still being investigated.
In this massive dump we had spotted a thread that had this stack trace
As you can see in the stack trace above , the thread itself that is generative the massive dump was supposed to have create some tracing metadata. The metadata was nowhere to be found.
7. The Metadata Information Might Also Be Incorrect - Write Lock Manager Is Trying to Acquire a READ Lock Key, Not a WRITE Lock Key:
Another bug here pertains to the nature of the metadata. As I explained earlier, the implementation of the
org.eclipse.persistence.internal.helper.ConcurrencyManager.putThreadAsWaitingToAcquireLockForWriting(Thread, String)
method is flawed. However, there's a second point to consider.In the old manipulated 2.6.4 code, the metadata we were associating with the WriteLock manager was related to its attempt to acquire a CacheKey for writing. However, in the original code, we used the method
putCurrentThreadAsWaitingToAcquireLockForReading
.I believe the old manipulated code is technically correct in distinguishing between READ lock and WRITE lock metadata information. Why? Let's examine the method:
As we can see from the method above, we are using
acquireReadLockNoWait
. This choice aligns with more accurate metadata.Note 1: We will provide Oracle with the in-house manipulated 2.6.x source code. This will allow them to compare the implementation of metadata acquisition and calls to the following method:
against the current 2.7.x version.
Note 2: I am also attaching a snippet of the manipulated 2.6.x classes for
WriteLockManager
andConcurrencyManager.java
. This snippet will facilitate the analysis of the mentioned methods.Please note that not all ongoing fixes in the 2.7.x version have been back-merged into our old, patched 2.6.x code. Back-merging recent developments from 2.7.x to 2.6.x is currently a low-priority task. Nevertheless, the attached code is relevant in the context of this defect.
WriteLockManager_2_6_4.txt ConcurrencyManager_2_6_4.txt