eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.25k stars 715 forks source link

DTFJ: Finalized objects incorrectly tagged with HEAP_ROOT_FINALIZABLE_OBJ #7091

Open paulcheeseman opened 4 years ago

paulcheeseman commented 4 years ago

Consider the following testcase:

import java.io.Closeable;
import com.ibm.jvm.Dump;

public class FinalizerTest implements Closeable{
    private boolean isClosed = false;

    public static void main (String args[]) {
        for (int i = 0; i < 1000; i++) {
            new FinalizerTest();
        }
        System.gc();
        Dump.SystemDump();
    }

    public void close() {
        this.isClosed = true;
    }

    public void finalize() {
        close();
    }
}

This program creates and discards 1000 objects with finalizers, then forces a GC to induce finalization, and finally triggers a dump so we can examine the state of these objects.

If you run the test and open the resulting dump in Eclipse MAT you will find 1000 FinalizerTest objects of root type Finalizable. If you run the "Finalizer Overview" query you can also see that MAT describes the objects as being "ready for finalizer thread".

However, if you look at the fields in these objects you can see that they are closed, and close() can only be called from finalize(), so the objects have in fact already been finalized. The information presented by MAT is contradictory and misleading.

It turns out that MAT is passing on information provided by DTFJ. When parsing system dumps DTFJ tags objects with a finalize() method as HEAP_ROOT_FINALIZABLE_OBJ. MAT then assumes that objects tagged in this way are on the finalizer queue waiting to be finalized. This might be correct, but it might not - objects appear to be tagged in this way by DTFJ regardless of whether or not their finalizer has been run.

When finalizable objects have been processed by the finalizer thread they should become unreachable, so they should not be tagged as roots (of any kind) by DTFJ. MAT would then present the objects correctly - i.e. they would be visible in the unreachable objects histogram, or as "roots" of type Unreachable if "keep unreachable objects" were enabled when parsing the dump.

pshipton commented 4 years ago

@keithc-ca @dmitripivkine

dmitripivkine commented 4 years ago

@amicic FYI

keithc-ca commented 4 years ago

@paulcheeseman Can you please share the the output of java -version that you used to run your test program?

paulcheeseman commented 4 years ago

@keithc-ca

I was testing with IBM Java 8.0.5.37 on Windows:

java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 8.0.5.37 - pwa6480sr5fp37-20190618_01(SR5 FP37))
IBM J9 VM (build 2.9, JRE 1.8.0 Windows 10 amd64-64-Bit Compressed References 20190617_419755 (JIT enabled, AOT enabled)
OpenJ9   - 354b31d
OMR      - 0437c69
IBM      - 4972efe)
JCL - 20190606_01 based on Oracle jdk8u211-b25

I just tried to reproduce the issue with a recent Adopt binary but I can't open the resulting dump at all in MAT due to the Error below when parsing - there seems to be some other problem.

"java/lang/ExceptionInInitializerError" Nested Exception: "java/lang/IllegalArgumentException" Detail:  "Unsupported platform" received 
    at java/lang/J9VMInternals.ensureError(J9VMInternals.java:146)
    at java/lang/J9VMInternals.recordInitializationFailure(J9VMInternals.java:135)
    at com/ibm/j9ddr/vm29/j9/stackwalker/StackWalker$StackWalker_29_V0.walkStackFrames(StackWalker.java:196)
    at com/ibm/j9ddr/vm29/j9/stackwalker/StackWalker$StackWalker_29_V0.walkStackFrames(StackWalker.java:165)
    at com/ibm/j9ddr/vm29/j9/stackwalker/StackWalker.walkStackFrames(StackWalker.java:96)
    at com/ibm/j9ddr/vm29/view/dtfj/java/DTFJJavaThread.walkStack(DTFJJavaThread.java:370)
    at com/ibm/j9ddr/vm29/view/dtfj/java/DTFJJavaThread.getStackFrames(DTFJJavaThread.java:220)
    at org/eclipse/mat/dtfj/DTFJIndexBuilder.printThreadStack(DTFJIndexBuilder.java:2470)
    at org/eclipse/mat/dtfj/DTFJIndexBuilder.processDTFJRoots(DTFJIndexBuilder.java:2846)
    at org/eclipse/mat/dtfj/DTFJIndexBuilder.fill(DTFJIndexBuilder.java:1935)
    at org/eclipse/mat/parser/internal/SnapshotFactoryImpl.parse(SnapshotFactoryImpl.java:222)
    at org/eclipse/mat/parser/internal/SnapshotFactoryImpl.openSnapshot(SnapshotFactoryImpl.java:126)
    at org/eclipse/mat/snapshot/SnapshotFactory.openSnapshot(SnapshotFactory.java:145)
    at org/eclipse/mat/ui/snapshot/ParseHeapDumpJob.run(ParseHeapDumpJob.java:83)
    at org/eclipse/core/internal/jobs/Worker.run(Worker.java:60)
kgibm commented 4 years ago

One minor note on this:

When parsing system dumps DTFJ tags objects with a finalize() method as HEAP_ROOT_FINALIZABLE_OBJ. MAT then assumes that objects tagged in this way are on the finalizer queue waiting to be finalized. This might be correct, but it might not - objects appear to be tagged in this way by DTFJ regardless of whether or not their finalizer has been run.

The more fundamental issue is that the objects whose finalizer has already been run are still returned when MAT calls dtfjInfo.getJavaRuntime().getHeapRoots();. The HEAP_ROOT_FINALIZABLE_OBJ tag is incidental - MAT uses that to mark that object as being on the finalizer queue, but even if the object wasn't tagged in this way, if it was still returned in getHeapRoots(), MAT would still consider it a GC root and fully alive, so perhaps the solution would be for DTFJ not to return such objects in getHeapRoots().

paulcheeseman commented 4 years ago

Agreed. I wasn't really sure of the mechanics inside DTFJ and MAT, but they shouldn't be described as roots at all once they've been finalized.

keithc-ca commented 4 years ago

IBM builds use different technology than OpenJ9 to capture the information that becomes the blob in core files: this means you can't expect to use an IBM VM to examine a core file produced from an OpenJ9 VM nor vice-versa.

paulcheeseman commented 4 years ago

IBM builds use different technology than OpenJ9 to capture the information that becomes the blob in core file: this means you can't expect to use an IBM VM to examine a core file produced from an OpenJ9 VM nor vice-versa.

Does this mean there are two different DTFJ plugins for MAT? One for IBM Java and one for OpenJ9?

keithc-ca commented 4 years ago

When I use jdmpview to examine a core file produced from the example above, there are no instances of FinalizerTest. Using that class as the target, !rootpathfindall finds no instances.

I think @kgibm is on the right track with the comment that

    dtfjInfo.getJavaRuntime().getHeapRoots()

returns objects it shouldn't.

keithc-ca commented 4 years ago

Does this mean there are two different DTFJ plugins for MAT? One for IBM Java and one for OpenJ9?

I don't know how MAT connects to DTFJ or how its plugin is constructed.

kgibm commented 4 years ago

I don't know how MAT connects to DTFJ or how its plugin is constructed.

There's a MAT DTFJ plugin which packages a copy of DTFJ JAR files, so if there is a non-backwards compatible update to DTFJ, then those dumps cannot be parsed. Unfortunately, the build machine that built the MAT DTFJ plugin was sunset so service is difficult. If necessary, I have manually updated the MAT DTFJ plugin like so:

cp -f /opt/ibm/java/8.0.5.35/jre/lib/ddr/j9ddr.jar /opt/mat/configuration/org.eclipse.osgi/*/data/*/plugins/com.ibm.dtfj.j9_*/lib/j9ddr.jar
cp -f /opt/ibm/java/8.0.5.35/jre/lib/ext/dtfj.jar /opt/mat/configuration/org.eclipse.osgi/*/data/*/plugins/com.ibm.dtfj.j9_*/lib/dtfj.jar

However, this drives some sort of strange issue that I haven't had the time to investigate where there is some race condition and every ~5% of times that MAT is opened, DTFJ is not registered and MAT cannot open PHDs or system dumps (restarting MAT usually solves the issue). Using -Dosgi.clean=true does not help.

pshipton commented 4 years ago

Using the DTFJ plugin available from http://public.dhe.ibm.com/ibmdl/export/pub/software/websphere/runtimes/tools/dtfj/ I'm able to open and read the latest OpenJ9 core files. There was a time period where this didn't work, fixed via #6317 and #6321 for the 0.15 release. If necessary, let's take these concerns to a separate issue if needed, and keep this one about the Finalized object roots.

paulcheeseman commented 4 years ago

When I use jdmpview to examine a core file produced from the example above, there are no instances of FinalizerTest. Using that class as the target, !rootpathfindall finds no instances.

I see exactly the same thing with my dumps too. It seems that jdmpview identifies roots in a different way to the DTFJ plugin in MAT.

Here's an MAT screenshot for a dump taken from OpenJDK8U-jdk_x64_windows_openj9_8u222b10_openj9-0.15.1:

image

kgibm commented 4 years ago

I'm not sure if you all are able to look at Eclipse MAT source (License=EPL1.0), but it calls DTFJ's getHeapRoots on both the result of getJavaRuntime as well iterating over all getJavaRuntime().getThreads(), iterating over each thread's getStackFrames() and calling JavaStackFrame.getHeapRoots. See the processDTFJRoots function in https://git.eclipse.org/c/mat/org.eclipse.mat.git/tree/plugins/org.eclipse.mat.dtfj/src/org/eclipse/mat/dtfj/DTFJIndexBuilder.java

For any root returned from either case, MAT calls getRootType and if it is JavaReference.HEAP_ROOT_FINALIZABLE_OBJ, then it is marked as a finalizable GC root. See the processRoot function in https://git.eclipse.org/c/mat/org.eclipse.mat.git/tree/plugins/org.eclipse.mat.dtfj/src/org/eclipse/mat/dtfj/DTFJIndexBuilder.java

dmitripivkine commented 4 years ago

If I am reading code correctly it assumes that any instance in the heap of class contain "finalize()" method is treated as alive root. Obviously main concern of developer was "do not miss any potential candidate". So there is no differentiation between "alive" and "dead" objects - no surprise.

dmitripivkine commented 4 years ago

I am not sure how to make correct decision JVM-agnostic. In theory each finalized object has a state:

    /**
     * Values for the 'state' field in java.lang.ref.Reference.
     * Note that these values are mirrored in the Java code. Do not change them.
     */
    enum ReferenceState {
        REF_STATE_INITIAL = 0, /**< indicates the initial (normal) state for a Reference object. Referent is weak. */
        REF_STATE_CLEARED = 1, /**< indicates that the Reference object has been cleared, either by the GC or by the Java clear() API. Referent is null or strong. */ 
        REF_STATE_ENQUEUED = 2, /**< indicates that the Reference object has been cleared and enqueued on its ReferenceQueue. Referent is null or strong. */
        REF_STATE_REMEMBERED = 3, /**< indicates that the Reference object was discovered by a global cycle and that the current local GC cycle must return it to that list and restore the state to INITIAL. */
    };

However different JVMs might have more states or slightly different state machines

kgibm commented 4 years ago

@dmitripivkine An alternative to changing the result of getHeapRoots would be to add another flag to JavaReference like HEAP_ROOT_FINALIZABLE_OBJ_FINALIZED as the result of getRootType if the object has been finalized and is just waiting to be cleaned by the next GC.

ajohnson1 commented 4 years ago

If I am reading code correctly it assumes that any instance in the heap of class contain "finalize()" method is treated as alive root. Obviously main concern of developer was "do not miss any potential candidate". So there is no differentiation between "alive" and "dead" objects - no surprise.

As author of the MAT code I think the intention of the code was to cope with DTFJ without accurate finalizer information. The guard is

        // Missing finalizables from XML and GC roots
        // All objects with a finalize method which are not reached from
        // another object are guessed as being 'finalizable'.
        if (guessFinalizables && !(goodDTFJRoots && foundFinalizableGCRoots))

where guessFinalizables is always true, goodDTFJRoots means that some roots were returned from dtfjInfo.getJavaRuntime().getHeapRoots() and there was at least one root JavaReference.HEAP_ROOT_FINALIZABLE_OBJ

If that wasn't so, then MAT guesses there is no accurate finalizer information from DTFJ and marks unreferenced objects with a finalize() method as GCRootInfo.Type.FINALIZABLE so that they are retained for later examination by the user of MAT. If the finalizer is up to date so there are no finalizable objects, but the finalized objects haven't been discarded then MAT might guess that DTFJ wasn't returning finalize information and mark them anyway. It does print a message in this situation: Object of type {0} at {1} is presumed to be finalizable so has been marked as a GC root. [MAT also works with PHD and Javacore files, where the root information is slight or missing, so guessing could be useful then]. MAT could be modified to guess that DTFJ finalizer information is accurate if JavaReference.HEAP_ROOT_UNFINALIZED_OBJ is present.

Another idea is that DTFJ has the JavaReference.HEAP_ROOT_UNFINALIZED_OBJ type as well for all finalizable objects. MAT doesn't use this as a root because it confuses the leak report because the object won't then show as being retained (and prevented from being garbage collected) by references from normal other objects. Perhaps this should remain set together with JavaReference.HEAP_ROOT_FINALIZABLE_OBJ when the object goes onto the finalizer queue and be cleared when the object has been removed from the queue and has been finalized.

For example:

JavaReference.HEAP_ROOT_UNFINALIZED_OBJ JavaReference.HEAP_ROOT_FINALIZABLE_OBJ meaning
N N ordinary object
Y N finalizable object in normal use
Y Y finalizable object on finalizer queue
N Y finalizable object after it has been removed from the queue and been finalized
ajohnson1 commented 4 years ago

The latest snapshot build of MAT works better. Now if it sees at least one HEAP_ROOT_UNFINALIZED_OBJ or one HEAP_ROOT_UNFINALIZED_OBJ root then it does not guess finalizable roots. This helps for the test program, but running on

openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.6+10)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.18.1, JRE 11 Windows 10 amd64-64-Bit Compressed References 20200122_442 (JIT enabled, AOT enabled)
OpenJ9   - 51a5857d2
OMR      - 7a1b0239a
JCL      - da35e0c380 based on jdk-11.0.6+10)

the test program still shows 1000 FINALIZABLE roots, because there are no unfinalized or finalizable objects in the dump. MAT does give some warnings in the log:

Object of type FinalizerTest at 0x7fff6c3a8 is presumed to be finalizable so has been marked as a GC root ... 1,000 finalizable objects marked as extra GC roots

Someone has done a good job in removing finalize() from the class libraries! If there is one other object marked as HEAP_ROOT_UNFINALIZED_OBJ or HEAP_ROOT_UNFINALIZED_OBJ then it works as expected and those 1000 are not marked as FINALIZABLE.

Also MAT now recognizes HEAP_ROOT_UNFINALIZED_OBJ. Those objects are only weakly retained for finalization and will only be put onto the finalization queue when no longer in use. Adding them as full GC roots is not a good idea. Instead I have added them as references from the java.lang.Runtime object, and the queries which understand weak references also treat these as a type of weak reference.

The actual J9 behaviour seems to be:

JavaReference.HEAP_ROOT_UNFINALIZED_OBJ JavaReference.HEAP_ROOT_FINALIZABLE_OBJ meaning
N N ordinary object or finalizable object which has been finalized
Y N finalizable object in normal use
N Y finalizable object on finalizer queue

I now think OpenJ9 DTFJ is behaving as expected. The slightly tricky part could be finding objects which have been finalized. If these are classes which have a finalize method (other than j.l.o) or have instances marked as unfinalized or finalizable, then treat all objects not marked as unfinalized or finalizable as having been finalized.

keithc-ca commented 4 years ago

I don't understand the table in the comment above: how is the third row meaningful as it appears to refer to an object that returns two distinct values from getRootType().

I would not expect DTFJ to refer to any objects that have been finalized; they're not reachable from any root.

ajohnson1 commented 4 years ago

I was exploring an idea where the object would be a root for two reasons, so would have two JavaReferences. It happens for other root types e.g. HEAP_ROOT_MONITOR and HEAP_ROOT_STACK_LOCAL. Now I don't think it is worth the complication for finalizers.

A finalized object could be seen in a heap dump if GC hasn't yet disposed of it. Normally MAT would discard it, but could be seen in the unreachable objects histogram or if keep_unreachable_objects was set. Also a finalizer could choose to make the object ordinarily accessible again; normally this would be a mistake, but I don't know if it happens often enough for it to be useful for MAT to detect that. I agree that they aren't roots anymore.