Open 99sono opened 5 months ago
We received feedback today from the project team that our patching has improved things, and they no longer need to perform application server restarts due to these NPExceptions. Normally projects do not experience this problem, but this specific project for whatever reason was being heavily affected by this bug. It might still be too early to determine if this is the end of the story, but we have consistently patched this issue for every version from 2.4.2 to 2.6.7 in the past. The fixes we implemented there will certainly not cause harm, as they all enhance the robustness of the code. However, they do not explain the strange phenomenon of cache keys becoming null in the primary key. Unfortunately, we lack the expert knowledge and experience in EclipseLink to provide such an explanation.
I wanted to provide an update regarding our ongoing efforts to address the issue. Unfortunately, our fix is not yet complete.
Recently, the project encountered the following stack trace while utilizing our latest patch:
Caused by: java.lang.NullPointerException
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.createStringWithSummaryOfReadLocksAcquiredByThread(ConcurrencyUtil.java:891) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.createInformationAboutAllResourcesAcquiredAndDeferredByThread(ConcurrencyUtil.java:1263) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.createInformationAboutAllResourcesAcquiredAndDeferredByAllThreads(ConcurrencyUtil.java:1152) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationStep02(ConcurrencyUtil.java:627) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationStep01(ConcurrencyUtil.java:590) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationIfAppropriate(ConcurrencyUtil.java:513) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.determineIfReleaseDeferredLockAppearsToBeDeadLocked(ConcurrencyUtil.java:178) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(WriteLockManager.java:171) ~[eclipselink_2_7_6_hacked.jar]
Despite our efforts, it remains possible to encounter NPExceptions
while attempting to create the massive dump of information.
I will keep you informed of any progress as we continue to patch and improve the situation.
I am quite certain that I understand why our ConcurrencyUtil logic encountered issues within the org.eclipse.persistence.internal.helper.ConcurrencyUtil.createStringWithSummaryOfReadLocksAcquiredByThread(ReadLockManager, String) method. The reason is straightforward, in my opinion. It stems from the fact that the data structure we are iterating over is not a deep copy. Specifically, the read lock manager, org.eclipse.persistence.internal.helper.ReadLockManager.getMapThreadToReadLockAcquisitionMetadata(), returns an unmodifiable map. However, the underlying map and its list values can be modified by concurrent threads behind the scenes. To address this, the org.eclipse.persistence.internal.helper.ReadLockManager.getMapThreadToReadLockAcquisitionMetadata() method should return a deep clone of the map and its contained entries. This adjustment now makes much more sense. I will show the improved org.eclipse.persistence.internal.helper.ReadLockManager.getMapThreadToReadLockAcquisitionMetadata() in a second.
The following is what unbkreable code looks like:
/** Getter for {@link #mapThreadToReadLockAcquisitionMetadata} returns a deep clone */
public synchronized Map<Long, List<ReadLockAcquisitionMetadata>> getMapThreadToReadLockAcquisitionMetadata() {
// We cannot simply return an unmodifiable map here. There are two reasons for this:
// 1. The code consuming this unmodifiable map might be surprised by changes to the map itself caused by this
// read lock manager.
// If threads continue to acquire and release read locks, it could impact this object.
// 2. Additionally, the list values contained in the map could also be affected by threads that are reading and
// releasing read locks.
// Our approach should be to provide the ConcurrencyUtil with a safe deep clone of the data structure for its
// massive dumps.
// (a) Let's start by creating an independent result map that the caller can safely iterate over.
Map<Long, List<ReadLockAcquisitionMetadata>> resultMap = new HashMap<>();
// (b) depp clone the data strcuture
for (Entry<Long, List<ReadLockAcquisitionMetadata>> currentEntry : mapThreadToReadLockAcquisitionMetadata
.entrySet()) {
ArrayList<ReadLockAcquisitionMetadata> deepCopyOfCurrentListOfMetadata = new ArrayList<>(currentEntry.getValue());
resultMap.put(currentEntry.getKey(), unmodifiableList(deepCopyOfCurrentListOfMetadata) );
}
// (c) even if our result map is deep clone of our internal sate
// we still want to return it as unmodifable so that callers do not have the illusion
// they are able to hack the state of the read lock manager from the outside.
// If any code tris to manipulate the returned clone they should get a blow up to be dispelled of any illiusions
return Collections.unmodifiableMap(resultMap);
}
The modification to the org.eclipse.persistence.internal.helper.ReadLockManager.getMapThreadToReadLockAcquisitionMetadata()
method is crucial. It ensures that the org.eclipse.persistence.internal.helper.ConcurrencyUtil.createStringWithSummaryOfReadLocksAcquiredByThread(ReadLockManager, String)
function is not affected by the fact that, while generating the massive dump, it might encounter threads manipulating the hash map it is iterating over behind the scenes.
I will still strength the code of both the read lock manager and the ConcurrencyUtil with further level of paranoia/safety-check code to ensure I never see again a NPexception in this code area.
I will provide the patch later today.
In this post, I am presenting the second patch for this bug, which builds upon the first patch previously provided.
The associated commit message for these changes is as follows:
TRK-32315: Addressing Concurrency Issues in ReadLockManager
Continuing our efforts to resolve issue #2114, we have identified the root cause of the bug. The crux of the matter lies in the `ReadLockManager` providing the `ConcurrencyUtil` access to its state without performing deep copies.
Specifically, the `ReadLockManager` exposes its state through unmodifiable hash maps and lists. However, beneath these unmodifiable wrappers, the true data structures of the class remain accessible.
During the process of generating massive dumps, the `ConcurrencyUtil` iterates over metadata. Simultaneously, the read lock manager's data structures could undergo additions and removals of cache keys. These lists behind the scenes are not inherently thread-safe since they are always accessed within synchronized methods. However, an exception occurs when the concurrency util itself iterates over them, leading to mysterious NULL entries in the lists.
Fundamentally, the code itself does not logically allow NULL values to be entered into the read lock manager. However, the concurrency between the read lock manager's internal logic and the massive dump generation could inadvertently corrupt the data structure, resulting in this phenomenon.
0001-TRK-32315-Addressing-Concurrency-Issues-in-ReadLockM.patch
and the literal files are as follows issue_2114_commit_f6dca865.zip
About NPE primaryKey
public CacheKey(Object primaryKey) {
this.key = primaryKey;
// make sure we trace the primary key for which the cache key was created in an immutable field
if(primaryKey == null) {
this.primaryKeyAtTimeOfCreation = PRIMARY_KEY_AT_TIME_OF_CONSTRUCTION_NULL_A_NULL_PRIMARY_KEY_WAS_PASSED_IN_TO_OBJECT_CONSTRUCTOR;
} else {
this.primaryKeyAtTimeOfCreation = primaryKey;
}
}
I'm not sure about this approach, because we silently accept incorrect value null
. At least I'll add some logging there (WARNING level with some small message and FINEST with stack trace).
But my question is: Have You got some idea why it happens? Or which conditions should trigger it?
Hi,
Regarding the issue of CacheKey
being null, I I have no explanation.
It appears to be a rather rare condition, as only one project in version 2.7.6 has encountered this problem. We also faced a similar situation in version 2.6.5 (e.g. I could be wrong on the older version) . It is not a common pattern across projects.
Now, on the topic of the NPE working with data from org.eclipse.persistence.internal.helper.ReadLockManager.mapThreadToReadLockAcquisitionMetadata
. This list can potentially lead to either a NullPointerException
(NPException) in the ConcurrencyUtil
when logging a massive dump, as seen in the stack trace chunk above, or issues within the ReadLockManager
itself while adding or removing cache keys.
Regarding this specific matter:
So, the situation regarding the read lock metadata is quite clear.
The org.eclipse.persistence.internal.helper.ReadLockManager.getMapThreadToReadLockAcquisitionMetadata()
should have been protecting its internal state much safer. Those ConccurrencyUtil was essentially handling lists it should never had direct access to.
As for updates, I am awaiting feedback from the project. They have been operating for quite some time before encountering this issue. It may take several weeks to receive confirmation that improvements are indeed underway.
Thank you.
Let me summarize the types of null pointer exceptions that are patch should be addressing.
This pattern A:
Caused by: java.lang.NullPointerException
at org.eclipse.persistence.internal.identitymaps.CacheKey.equals(CacheKey.java:331) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.identitymaps.CacheKey.equals(CacheKey.java:320) ~[eclipselink_2_7_6_hacked.jar]
at java.util.Vector.indexOf(Vector.java:431) ~[?:?]
at java.util.Vector.indexOf(Vector.java:405) ~[?:?]
at java.util.Vector.removeElement(Vector.java:668) ~[?:?]
at java.util.Vector.remove(Vector.java:843) ~[?:?]
at org.eclipse.persistence.internal.helper.ReadLockManager.removeReadLock(ReadLockManager.java:97) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyManager.removeReadLockFromReadLockManager(ConcurrencyManager.java:958) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyManager.releaseReadLock(ConcurrencyManager.java:691) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.identitymaps.CacheKey.releaseReadLock(CacheKey.java:482) ~[eclipselink_2_7_6_hacked.jar]
This pattern B:
Caused by: java.lang.NullPointerException
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.createStringWithSummaryOfReadLocksAcquiredByThread(ConcurrencyUtil.java:891) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.createInformationAboutAllResourcesAcquiredAndDeferredByThread(ConcurrencyUtil.java:1263) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.createInformationAboutAllResourcesAcquiredAndDeferredByAllThreads(ConcurrencyUtil.java:1152) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationStep02(ConcurrencyUtil.java:627) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationStep01(ConcurrencyUtil.java:590) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationIfAppropriate(ConcurrencyUtil.java:513) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyUtil.determineIfReleaseDeferredLockAppearsToBeDeadLocked(ConcurrencyUtil.java:178) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(WriteLockManager.java:171) ~[eclipselink_2_7_6_hacked.jar]
This pattern C:
Caused by: java.lang.NullPointerException
at org.eclipse.persistence.internal.helper.ReadLockManager.removeReadLock(ReadLockManager.java:84) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyManager.removeReadLockFromReadLockManager(ConcurrencyManager.java:958) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.helper.ConcurrencyManager.releaseReadLock(ConcurrencyManager.java:691) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.identitymaps.CacheKey.releaseReadLock(CacheKey.java:482) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1103) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1014) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(UnitOfWorkIdentityMapAccessor.java:211) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getFromIdentityMap(UnitOfWorkIdentityMapAccessor.java:139) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:4093) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:4016) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.mappings.CollectionMapping.buildElementUnitOfWorkClone(CollectionMapping.java:313) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.mappings.CollectionMapping.buildElementClone(CollectionMapping.java:326) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.internal.queries.ContainerPolicy.addNextValueFromIteratorInto(ContainerPolicy.java:217) ~[eclipselink_2_7_6_hacked.jar]
at org.eclipse.persistence.mappings.CollectionMapping.buildCloneForPartObject(CollectionMapping.java:228) ~[eclipselink_2_7_6_hacked.jar]
I do not have any NPExceptin for the cache key hashcode method. But also that mehod should be made bullet-proof.
Pattern B and C are most likely heavily married to one another, it is that illusion of a null entry that I explained above. Pattern A of how cache key can go Null is a mystery. For the Pattern A I believe it is important to add that extra metadata to know what the CacheKey was like at its time of creation.
Thanks.
The project team tells me they are happy with the patched eclipselink, for the time being at least. They have not encountered issues of this nature since 2024.04.29. If the issue is not yet definitely resolved, you will hear from me again on this topic.
Hello on the beginning You mentioned oucode.entity.SomeBusinessEntity.getSomeWhateverOneToManyRelationship()
.
My question is. How is value of the @Id
field(s) filled? Directly by assignment e.g. from constructor or @GeneratedValue
is used https://jakarta.ee/specifications/persistence/2.2/apidocs/javax/persistence/generatedvalue ?
I'm thinking about case when entity instance is cached sooner, than @Id
field is generated.
We’ve encountered a rare but critical bug in a project using Oracle-patched EclipseLink 2.7.6, which was modified to align with newer versions like 2.7.9. This bug is associated with the CacheKey class in EclipseLink, which lacks robust equals and hashCode methods. The issue arises when a CacheKey.key unexpectedly becomes null, leading to unforeseen complications.
Historically, we’ve addressed this bug in our internally maintained versions of EclipseLink, specifically versions 2.4.2, 2.6.4, 2.6.5, and 2.6.7. These versions, released with various WebLogic versions, have been patched to include concurrency manager improvements.
Currently, there’s a bugfix for these older versions of EclipseLink that isn’t present in any of the 2.7.X tags. This fix should likely be integrated into the official branches to prevent future occurrences.
Although this bug is infrequent, its potential impact necessitates proactive measures. Here’s a brief overview of the bug we’ve recently re-encountered.
The stack-trace is sliced out, on purpose, to only cover eclipselink code and obscure any private information.
The application became unresponsive with all manner of exceptions in eclipselink, but at the point of origin we could see exceptions with a stack trace like this:
The stack trace indicates that
ReadLockManager.removeReadLock(ReadLockManager.java:97)
encounters an unexpected scenario where it is possible for "null" to be added toReadLockManager
. We do not know how it is possible, but facts are facts. Crucial here, theCacheKey
'sequals
method is not safeguarded againstNullPointerException
. Specifically, the issue lies within this code segment:https://github.com/eclipse-ee4j/eclipselink/blob/2.7.14/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/identitymaps/CacheKey.java
We have the patched method to look like this:
Notice that the method above will not blow up if the input parameter key is the null object.
The
hashCode()
method is another area that lacked safety. We have patched the implementation ofhashCode()
to address this issue. Previously, it was implemented as follows:The code above is prone to failure if
this.key
is null. While the exact cause is unknown, we are certain that this scenario can occur.To enhance safety, we have revised the implementation to be more robust against null values. The new implementation is as follows:
We will provide an attachment containing the fully patched
CacheKey
object.Additionally, we have made further enhancements to the metadata of the
CacheKey
object. Specifically, we have reintroduced a segment of code at the beginning of the class that was previously added to earlier versions of EclipseLink. This same code has now been incorporated into version 2.7.6 as follows:The updated code introduces two new fields:
primaryKeyAtTimeOfCreation
andcacheKeyInstanceCreationNumber
, along with associated metadata. This enhancement allows us to meticulously monitor the lifecycle of a cache key. We have added two immutable fields that capture the original key of the cache key at the moment of its creation.Furthermore, we have implemented a unique instance creation number for each object. This distinctive number enables us to definitively determine in the
equals
andhashCode
methods whether two objects are the same instance.Below, you can observe how the new immutable field
primaryKeyAtTimeOfCreation
is assigned during the object's construction phase:In summary, we have fortified the class by implementing the following improvements:
Revised
equals
andhashCode
Methods: We revisited the implementation of theequals
andhashCode
methods to ensure they behave as expected under normal circumstances and remain resilient when encountering null objects or keys.Enhanced
CacheKey
Metadata: We have augmented theCacheKey
metadata to better trace its challenging lifecycle and to identify the origins of corrupted cache keys in extensive dumps after the key has become null. To achieve this, we introduced new fields to theCacheKey
that are established at the time of construction and are immutable. Specifically, these fields are:protected final Object primaryKeyAtTimeOfCreation;
protected final long cacheKeyInstanceCreationNumber;
Additionally, for the comprehensive dump analysis that reports on the status of the cache keys, we have added the method:
public Object getKeyEnsuredToBeNotNull()
which returnsthis.key
if it is not null, orthis.getPrimaryKeyAtTimeOfCreation()
ifthis.key
is null.This methodical approach ensures that the
CacheKey
class is robust and reliable throughout its lifecycle.Our patching efforts extended beyond the
CacheKey
. In the past, we've encountered issues with massive dumps causing failures when executing the method:This was due to the following line of code, which proved problematic:
To resolve this, we've implemented a safer alternative:
With these changes, the trace message now includes two additional pieces of metadata:
cacheKeyInstanceCreationNumber
andprimaryKeyAtTimeOfCreation
, enhancing the diagnostic capabilities as demonstrated in the code snippet above. This ensures a more robust and fail-safe approach in handling cache keys within our system.In summary, we strongly recommend adopting our enhancements. We plan to test this patched version of EclipseLink 2.7.6 on the current project to determine if their ongoing problems get resolved.
In the uploaded TRK-32315_patchedCode.zip you will find our modified files ConcurrencyUtil.java, CacheKey.java and TraceLocalizationResource.java.
TRK-32315_patchedCode.zip
We will keep you posted on this issue, in particular when we make a breakthrough.