eclipse-mat / mat

The Eclipse Memory Analyzer is a fast and feature-rich Java heap dump analyzer that helps you find memory leaks and reduce memory consumption.
https://eclipse.dev/mat/
Eclipse Public License 2.0
81 stars 13 forks source link

ArrayIndexOutOfBoundsException in ArrayIntCompressed on beforePass2 parsing #38

Open eclipsewebmaster opened 5 months ago

eclipsewebmaster commented 5 months ago

| --- | --- | | Bugzilla Link | 581932 | | Status | ASSIGNED | | Importance | P3 normal | | Reported | May 12, 2023 04:08 EDT | | Modified | Jun 29, 2023 05:04 EDT | | Version | 1.14 | | Reporter | Gustav Hedengran |

Description

Created attachment 289069\ Stack trace

Exception occurs after parsing ~25% of a 45 GB heap dump. Unfortunately I can't share the heap dump.

Error happens on both 1.14 and 1.13. Hardware is Apple M2 Max.

Heap dump doesn't seem corrupt as it's correctly parsed on an Intel Mac and Fedora Linux.

Seems related to https://bugs.eclipse.org/bugs/show_bug.cgi?id=579931.

:notepad_spiral: file_581932.txt

eclipsewebmaster commented 5 months ago

By Andrew Johnson on May 12, 2023 08:50

Is that stack trace with the latest development code?

java.lang.ArrayIndexOutOfBoundsException: Index -3489970 out of bounds for length 3750002\ at org.eclipse.mat.collect.ArrayIntCompressed.set(ArrayIntCompressed.java:147)\ at org.eclipse.mat.parser.index.IndexWriter$IntIndexCollector.set(IndexWriter.java:712)\ at org.eclipse.mat.hprof.HprofParserHandlerImpl.beforePass2(HprofParserHandlerImpl.java:347)\ at org.eclipse.mat.hprof.HprofIndexBuilder.fill(HprofIndexBuilder.java:91)

That would correspond to the line:\ object2classId.set(clazz.getObjectId(), clazz.getClazz().getObjectId());

It is probably not a timing issue as it occurred twice (with 1.14 and 1.13). Also pass 1 and beforePass2 are single threaded.

The error would suggest that the object ID for the class was not found in // calculate instance size for all classes\ ClassImpl clazz = e.next();\ int index = identifiers0.reverse(clazz.getObjectAddress());\ clazz.setObjectId(index);

I would ask whether it occurs with MAT 1.11.0 as some of the indexing logic has changed since then, but I see that there is no Mac/Cocoa/AArch64 version 1.11 of MAT, and I note the problem does not occur with x86_64 Linux and Mac.

eclipsewebmaster commented 5 months ago

By Andrew Johnson on May 23, 2023 15:36

This could take a while to debug. Do you still have the dump and are you willing to try various test builds of MAT?

There's a possibility that it is a JVM/JIT error, so have you tried updating your JVM?

Does it occur with an Eclipse that runs on Mac M2, but with MAT 1.11 installed from https://download.eclipse.org/mat/1.11.0/update-site/ That might show whether some of the index changes were responsible in bug 579931 or bug 573258. [though if it was a JIT bug then small code changes can be enough to hide the bug].

I could add some debugging code which reported anything unusual earlier - but then I would need you to run a snapshot build and report the results, probably several times as I modified the code.

I have spotted a minor bug where there is an iterator over classesByAddress and in the loop a value is changed via a put on an existing index. According to the usual definitions of collections that should be safe, but if the MAT collection is at the size limit before resizing then it is resized even though no resize is needed as the collection doesn't get any bigger. This could then mess up the iterator. That code should behave the same way on x86-64 though, so might not be the problem.

eclipsewebmaster commented 5 months ago

By Gustav Hedengran on May 24, 2023 04:42

Thank you for looking into this.

I still have the heap dump and I'm happy to help. I've experimented with different JVMs and have gone through (at least) 17.0.6, 17.0.7 and 20.0.1.

Does it occur with an Eclipse that runs on Mac M2, but with MAT 1.11 installed from https://download.eclipse.org/mat/1.11.0/update-site/ That might show whether some of the index changes were responsible in bug 579931 or bug 573258. [though if it was a JIT bug then small code changes can be enough to hide the bug].

I just tried 1.11 through Eclipse and the error still occur, with the exact same error message.

I did try running x86-64 builds of MAT 1.11 and 1.14 on my Mac M2 through Rosetta 2 and in both cases MAT successfully parsed the heap dump.

eclipsewebmaster commented 5 months ago

By Andrew Johnson on May 26, 2023 03:24

Changes to fix the collections (might not fix this bug though)\ https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202110

eclipsewebmaster commented 5 months ago

By Andrew Johnson on May 26, 2023 03:45

Add extra error message - won't fix the problem but may give more information.\ https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202118\ https://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=54651070d185e9c369e1fab303fb5b2e6e3b298d

eclipsewebmaster commented 5 months ago

By Andrew Johnson on May 26, 2023 05:17

Test case fix\ https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202119\ https://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=e592bfde54695bfe6db6a91baf21fb1c9d429156

eclipsewebmaster commented 5 months ago

By Andrew Johnson on May 26, 2023 06:27

A snapshot build is now available with the collections fix and a bit more logging of errors. I don't think it will fix the problem, but please retest, and report the error log here.\ https://www.eclipse.org/mat/snapshotBuilds.php\ Thanks

eclipsewebmaster commented 5 months ago

By Gustav Hedengran on Jun 05, 2023 05:12

I tried the snapshot build and as you suspected, the problem persists. The additional logging output 33 new error messages. Roughly a third of those looked like this:

!ENTRY org.eclipse.mat.ui 4 0 2023-05-26 10:11:13.830\
!MESSAGE class jdk.internal.reflect.GeneratedMethodAccessor22341 @ 0x7f4c6749c450 not found in address index\

The rest of the errors concerned a class of our own, which is generated and loaded at runtime containing generated bytecode.

eclipsewebmaster commented 5 months ago

By Andrew Johnson on Jun 12, 2023 03:35

I still can't see how the problem could happen, and would welcome someone else to inspect the code.\ The reason for the problem is that some classes in classesByAddress have an address which is not found by a reverse lookup in indentifiers0.\ However, when I look through the code I see that every time a class is added to classesByAddress the address is also added to identifiers0.

Aarch64/Arm64 has some differences to x86_64 - e.g. the writes are weakly ordered. That shouldn't make a difference for Java programs which correctly follow the Java memory model.

I think it would be worth trying a different JVM, in case there is a bug in the JVM / JIT. Are you able to try an IBM Semeru JDK, which is based on OpenJ9?

https://developer.ibm.com/languages/java/semeru-runtimes/downloads/

Also, the identifiers0 index is sorted before doing binary lookups. This uses the Arrays.parallelSort methods. There is this bug:\ https://bugs.openjdk.org/browse/JDK-8076446 (array) Arrays.parallelSort is not stable\ That doesn't directly apply to sorting int or long arrays, but makes me a bit suspicious of the method, so I have tried sorting the array with an ordinary sort() after the parallelSort(). It should do nothing assuming the parallel sort works, and not be too slow if the array is sorted, but could fix a parallel sort, if the only error was just items in the wrong order, rather than items being omitted or duplicated.

https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202438

eclipsewebmaster commented 5 months ago

By Andrew Johnson on Jun 29, 2023 05:04

Gustav, have you had a chance to try the ideas in comment 9?

  1. Try an IBM Semeru Runtime for macOS aarch64
  2. Try the latest development build - which does an ordinary sort() after parallelSort() in case parallel sort is broken.
delgurth commented 1 month ago

I seem to be running into this exact issue with 2 large heap dumps. Interesting to me seems to be that the error occurs at exactly the same length, but with a different index:

java.lang.ArrayIndexOutOfBoundsException: Index -3038672 out of bounds for length 3750002
    at org.eclipse.mat.collect.ArrayIntCompressed.set(ArrayIntCompressed.java:147)
java.lang.ArrayIndexOutOfBoundsException: Index -2154681 out of bounds for length 3750002
    at org.eclipse.mat.collect.ArrayIntCompressed.set(ArrayIntCompressed.java:147)

I'm running Memory Analyzer 1.15.0 Release on a M3 Max, with JVM: temurin-21.0.2+13.0.LTS. Running in x86_64 mode also fixed it for me (although I didn't use the exact same JVM, I ran 2 patch levels higher in x86, 21.0.4)

delgurth commented 1 month ago

By Andrew Johnson on Jun 29, 2023 05:04

Gustav, have you had a chance to try the ideas in comment 9?

  1. Try an IBM Semeru Runtime for macOS aarch64
  2. Try the latest development build - which does an ordinary sort() after parallelSort() in case parallel sort is broken.

I've tried 1, and that also works. The version I've used:

java -version
openjdk version "21.0.4" 2024-07-16 LTS
IBM Semeru Runtime Open Edition 21.0.4.0 (build 21.0.4+7-LTS)
Eclipse OpenJ9 VM 21.0.4.0 (build openj9-0.46.0, JRE 21 Mac OS X aarch64-64-Bit 20240716_229 (JIT enabled, AOT enabled)
OpenJ9   - 1a6f6128aa
OMR      - 840a9adba
JCL      - 7d844187b25 based on jdk-21.0.4+7)
  1. that is tested with using build 1.15.0 I guess

While testing 1, I noticed that in both situations in which I got the array exception I first got another error (1 time for 1 and 2 times for the other one):

eclipse.buildId=unknown
java.version=21.0.2
java.vendor=Eclipse Adoptium
BootLoader constants: OS=macosx, ARCH=aarch64, WS=cocoa, NL=en_US
Command-line arguments:  -os macosx -ws cocoa -arch aarch64

org.eclipse.mat.ui
Error
Wed Sep 04 17:07:06 CEST 2024
class jdk.internal.reflect.GeneratedSerializationConstructorAccessor19832 @ 0x7ddeabd78 not found in address index

But not sure if we should search in Eclipse MAT for the actual problem seeing that OpenJ9 doesn't give this problem...