Automatically classify execution samples into categories

apangin commented 1 year ago

When analyzing a flame graph, one often asks questions like

How many time in total was spent on class loading?
What percentage of time Java code was running in the interpreter and in C1/C2 compiled methods?
What is the impact of lambdas on the startup time?
etc.

To assist in performance analysis, async-profiler can automatically classify stack traces into certain categories. Examples of such categories would be

Garbage collection
JIT compilation
Class loading
Class verification
Lambda bootstrapping
itable/vtable dispatch overhead
Running in the Interpreter
VM runtime
and so on.

apangin commented 1 year ago

A new option to jfr2flame converter was introduced: --classify. It groups samples by adding a special frame at the root of the stack trace with the category name:

The classifier works only with async-profiler recordings in .jfr format. --cstack dwarf option is recommended for better accuracy,

apangin commented 1 year ago

The initial implementation of the classifier was pushed.

Currently supported categories:

[gc] - GC worker threads.
[jit] - JIT compilation.
[vm] - Activity inside the JVM runtime: reflection, parking/unparking, JNI upcalls, stop-the-world operations, etc.
[native] - Execution inside native functions.
[Interpreter] - Java code running in the interpreter.
[c1_comp], [c2_comp] - Execution of C1/C2 compiled code.
[c2i_adapter] - Calling compiled-to-interpreted adapters.
[class::init] - VM-level class initialization (excluding execution of static initializers).
[class::load] - ClassLoader activity: lookup, JAR loading, new class definition.
[class::resolve] - Class resolution/linking.
[class::verify] - Class verification.
[lambda::init] - Bootstrapping of lambdas: initial construction and linking of a lambda implementation.

More categories to come later.

Note that categories are mutually exclusive, e.g. if URLClassLoader runs some C2-compiled code, it will be classified as [class::load] but not [c2_comp].

franz1981 commented 1 year ago

Lovely, that's a great addition Andrei!

schrepfler commented 1 year ago

Can this be used to perhaps group by thread (by name)?

apangin commented 1 year ago

@schrepfler What do you mean? There has always been the option --threads to group samples by thread.

pveentjer commented 1 year ago

This seems like a very useful feature.

Will there be an option to see both graphs in the same JMC? So you can switch between a view without classification and a view with? This way I don't need to collect 2 profiles.

franz1981 commented 1 year ago

This way I don't need to collect 2 profiles.

I believe that, given that it applies to JFR you would collect events in JFR formats and specify the classify later, on the converter i.e. jfr2flame meaning that you collect data just one, but post-process twice

apangin commented 1 year ago

@pveentjer @franz1981 Correct. The recording is not affected; you collect profile once and then use converter with different options to produce multiple views from it.

krzysztofslusarski commented 1 year ago

If you want to see more - https://github.com/async-profiler/async-profiler/commit/a8f20ebc79281c128f4691a5a739c4e7834a1af6 - just changes in the converter, not in recording.

parttimenerd commented 1 year ago

Will there be an option to see both graphs in the same JMC?

JMC does not support additional types. But I could implement in my IntelliJ Plugin :)

krzysztofslusarski commented 1 year ago

Will there be an option to see both graphs in the same JMC?

JMC does not support additional types. But I could implement in my IntelliJ Plugin :)

And because of that my viewer can't support it right now (I have JMC parser as dependency), I probably need to switch to Andrei's JFR parser sooner or later.

franz1981 commented 1 year ago

@krzysztofslusarski be aware that some JFR events collected by the JFR engine won't work if you do that eh, likely (even if the Andrei one is faster :P)

parttimenerd commented 1 year ago

Or it might be interesting to reuse the classification code and export it to a more generic classifier.

franz1981 commented 1 year ago

@parttimenerd

Or it might be interesting to reuse the classification code and export it to a more generic classifier.

That would be great too; including something that allow "group by tagging" - or collapse stack frames based on tagged types (they usually depends by the package name patterns)

krzysztofslusarski commented 1 year ago

@franz1981 I'm not really interested in events not produced by async-profiler. The goal of my viewer is to give a way of:

online filter applying (like time filters, stack filters, thread filters)
adding additional levels to flame graphs (like thread name, timestamp, correlation id,...)

and those operations I want to be fast, that's why I parse JFR and hold it in memory (with manual string deduplication so it doesn't need too much of it), to apply those functionalities faster.

parttimenerd commented 1 year ago

That's not what I meant, but this sounds interesting. I could look into this.

I rather thought about adding a class that can be used from embedding tools called Classifier (or so), which would wrap the currently implemented classifier methods to allow usage with method names just as strings, ...

parttimenerd commented 1 year ago

@krzysztofslusarski could my approach help you? I could even create a new ap-loader package that only contains the converters and the classifier. This should make it usable in all JFR contexts.

krzysztofslusarski commented 1 year ago

@parttimenerd hard to tell right now. It looks like I need access to one.jfr.JfrReader but only after I try to implement it I will find out if I don't need any change there.

parttimenerd commented 1 year ago

Why? Couldn't you just use a modified classifier on the JFR data directly without any use of one.jfr.JfrReader?

I could export one.jfr.JfrReader in ap-loader if this helps.

krzysztofslusarski commented 1 year ago

I currently use JMC implementation of JFR parser where type is returned here:

        public IMCFrame.Type getType() {
            Object t = this.type;
            if (!(t instanceof IMCFrame.Type)) {
                t = ParserToolkit.parseFrameType((String)t);
                this.type = t;
            }

            return (IMCFrame.Type)t;
        }

    public static enum Type {
        JIT_COMPILED,
        INTERPRETED,
        INLINED,
        UNKNOWN;
       // ....
    }

Async-profiler gives you more types as output and to correctly group the stacks you need to support more of them, not just those 4. I could create a subset of the functionality based on stack strings, but if I want to do it the same way as Anderi then I need all the supported types used in Classifier.

parttimenerd commented 1 year ago

I know, but I could implement a classifier that you could use to pass the JFR type, class name and method name of a frame. This method could then classify the frame. You then use this method to classify every frame in your application.

krzysztofslusarski commented 1 year ago

But that would need the type as integer or enum with values:

    public static final byte FRAME_INTERPRETED = 0;
    public static final byte FRAME_JIT_COMPILED = 1;
    public static final byte FRAME_INLINED = 2;
    public static final byte FRAME_NATIVE = 3;
    public static final byte FRAME_CPP = 4;
    public static final byte FRAME_KERNEL = 5;
    public static final byte FRAME_C1_COMPILED = 6;

Even if you do so I don't see a way to get int type from IMCFrame.

parttimenerd commented 1 year ago

This information is not even stored in the IMCFrame.Type, but mapping it using a simple switch-case should be easy to implement. Maybe one should add the int to the JMC JFR parser anyways. I'm currently using the OpenJDK JFR parser in my projects, so I have limited knowledge on the JMC JFR parser.

krzysztofslusarski commented 1 year ago

I'm using classes from org.openjdk.jmc.common, hard to tell if it's the same as yours, it's both JMC and OpenJDK.

            <dependency>
                <groupId>org.openjdk.jmc</groupId>
                <artifactId>flightrecorder</artifactId>
                <version>${flightrecorder.version}</version>
            </dependency>

parttimenerd commented 1 year ago

It's a different implementation.

Could a classifier wrapper help you then?

krzysztofslusarski commented 1 year ago

No, I don't think so. With this implementation I cannot get int type. I need to switch to different JFR parser implementation.

parttimenerd commented 1 year ago

Hm?

public int classify(IMCFrame.Type type, String klass, String method) {
   int t;
   switch (type) {
     case COMPILED:
       t = ...;
       break;
     ...
     default: throw ...;
  }
  return AsyncProfilerClassifier.classify(t, klass, method);
}

krzysztofslusarski commented 1 year ago

That switch can have only 4 case branches, since that's the number of entries in enum. Async-profiler generates 7 different types. With that approach you cannot get FRAME_C1_COMPILED classification.

parttimenerd commented 1 year ago

You're correct. That's not that great. Using the async-profiler JFR reader seems to be the only real option then.

xiejf2020 commented 1 year ago

What an awesome feature! But it doesn't seem to support AOT (although I know AOT was removed in JDK17...)? Would you consider supporting it?

apangin commented 1 year ago

@xiejf2020 How do you think AOT support should look like and what problem/question this should solve?

xiejf2020 commented 1 year ago

@xiejf2020 How do you think AOT support should look like and what problem/question this should solve?

I hope there's a [aot_comp] in addition to [Interpreter], [c1_comp] and [c2_comp].

Currently, all code (including Interpreter and C1) seems to be identified as C2 when AOT is on (maybe the whole logic used to identify the compilation is broken due to AOT). The following program was run on OpenJDK16.

java -XX:+UnlockExperimentalVMOptions -XX:+UseAOT -XX:AOTLibrary=$PWD/whatever.so -jar spring-petclinic-2.7.3.jar

Whatever whatever.so is here (maybe even if it's an empty lib with no compiled code) results in all code being recognized as C2:

And this is the flamegraph without AOT (C1 and C2 are correctly identified):

I know there are probably not many people who use AOT, so if you don't have time to fix it, maybe give me some tips about how to fix it :)

async-profiler / async-profiler

Automatically classify execution samples into categories #719