Kotlin / kotlinx.coroutines

Library support for Kotlin coroutines
Apache License 2.0
12.84k stars 1.83k forks source link

Introduce coroutine dump format for better visualisation of structured concurrency #3587

Open mvicsokolova opened 1 year ago

mvicsokolova commented 1 year ago

Introduction

Currently kotlinx-coroutines-debug provides the following API for dumping coroutines:

For now, there is no coroutine dump that would group coroutines in a meaningful way visualising relationships of structured concurrency and providing a stack trace. This proposal aims to come up with a more suitable dump format.

Proposed solution:

DebugProbes.dumpCoroutines can provide coroutine dump in JSON format in addition to plain text:

DebugProbes.dumpCoroutines(System.out, CoroutineDumpFormat.JSON)

Pros:

Cons:

Possible alternatives:

Plain text + indents

We can still print coroutine dump as a plain text, but coroutines may be sorted by their parent and indents will show their relationships.

Example output:

- BlockingCoroutine{Completing}@2a5c959b [BlockingEventLoop@1d80b9da]
    - StandaloneCoroutine{Active}@154d1536, state: SUSPENDED, name: 'root rb 0' [BlockingEventLoop@1d80b9da]
        at kotlinx.coroutines.DelayKt.awaitCancellation(Delay.kt:148)
        at com.intellij.internal.TestCoroutineProgressAction$cancellableBGProgress$1$1$2$2.invokeSuspend(TestCoroutineProgressAction.kt:91)
    - StandaloneCoroutine{Completing}@1b7e4b1f, name: 'root rb 1' [BlockingEventLoop@1d80b9da]
        - StandaloneCoroutine{Active}@56b4c42, state: SUSPENDED, name: 'root rb 1:0' [BlockingEventLoop@1d80b9da]
            at kotlinx.coroutines.DelayKt.awaitCancellation(Delay.kt:148)
            at com.intellij.internal.TestCoroutineProgressAction$cancellableBGProgress$1$1$2$3$1.invokeSuspend(TestCoroutineProgressAction.kt:95)
    - StandaloneCoroutine{Active}@491608fb, state: SUSPENDED, name: 'root rb 2' [BlockingEventLoop@1d80b9da]
        at kotlinx.coroutines.DelayKt.awaitCancellation(Delay.kt:148)
        at com.intellij.internal.TestCoroutineProgressAction$cancellableBGProgress$1$1$2$4.invokeSuspend(TestCoroutineProgressAction.kt:102)
        - StandaloneCoroutine{Active}@2d37afba, state: SUSPENDED, name: 'root rb 2:0' [BlockingEventLoop@1d80b9da]
            at kotlinx.coroutines.DelayKt.awaitCancellation(Delay.kt:148)
            at com.intellij.internal.TestCoroutineProgressAction$cancellableBGProgress$1$1$2$4$1.invokeSuspend(TestCoroutineProgressAction.kt:100)

Pros:

fvasco commented 1 year ago

Your question looks like a false dichotomy. May we improve the current text format now and introduce new ones in the future?

In your example, CoroutineDumpFormat.JSON should not be an enumeration, but an implementation of an interface (similar to Dispather.IO), in such case we don't have to define and provide now a dump "JEP 425 compatible", and we leave open the door for third-party implementations (text+ASCII color, YAML format, binary format...).

mvicsokolova commented 1 year ago

Thank you for your comment! That’s true, just improving text format of the dump is a possible option.

Do you have any suggestions on what may make the text format more readable and convenient for troubleshooting?

fvasco commented 1 year ago

The main (ignored) issue for me is the duplication on a large dump. The coroutine's main goal is the ability to run thousands (or millions) of coroutines, unfortunately reading thousands (or millions) of stack traces is really demanding. Moreover, many of these are identical and are duplicated many times.

See this trivial example:

Coroutines dump 2023/01/18 09:52:01

Coroutine BlockingCoroutine{Active}@68ff4cf8, state: RUNNING
    at java.base/java.lang.Thread.getStackTrace(Thread.java:1602)
    at kotlinx.coroutines.debug.internal.DebugProbesImpl.enhanceStackTraceWithThreadDumpImpl(DebugProbesImpl.kt:339)
    at kotlinx.coroutines.debug.internal.DebugProbesImpl.dumpCoroutinesSynchronized(DebugProbesImpl.kt:294)
    at kotlinx.coroutines.debug.internal.DebugProbesImpl.dumpCoroutines(DebugProbesImpl.kt:266)
    at kotlinx.coroutines.debug.DebugProbes.dumpCoroutines(DebugProbes.kt:144)
    at kotlinx.coroutines.debug.DebugProbes.dumpCoroutines$default(DebugProbes.kt:144)
    at MainKt$main$1.invokeSuspend(Main.kt:21)
    (Coroutine creation stacktrace)
    at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
    at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:58)
    at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
    at MainKt.main(Main.kt:14)
    at MainKt.main(Main.kt)

Coroutine StandaloneCoroutine{Active}@2fd9881b, state: SUSPENDED
    at MainKt$main$1$1$1.invokeSuspend(Main.kt:18)
    (Coroutine creation stacktrace)
    at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
    at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
    at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:47)
    at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
    at MainKt$main$1.invokeSuspend(Main.kt:17)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

Coroutine StandaloneCoroutine{Active}@70944db3, state: SUSPENDED
    at MainKt$main$1$1$1.invokeSuspend(Main.kt:18)
    (Coroutine creation stacktrace)
    at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
    at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
    at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:47)
    at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
    at MainKt$main$1.invokeSuspend(Main.kt:17)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

Coroutine StandaloneCoroutine{Active}@276103c5, state: SUSPENDED
    at MainKt$main$1$1$1.invokeSuspend(Main.kt:18)
    (Coroutine creation stacktrace)
    at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
    at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
    at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:47)
    at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
    at MainKt$main$1.invokeSuspend(Main.kt:17)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

Collapsing coroutines with the same stack is really useful, the dump is shorter and more readable.

Coroutines dump 2023/01/18 09:52:01

Coroutine BlockingCoroutine{Active}@68ff4cf8, state: RUNNING
    at java.base/java.lang.Thread.getStackTrace(Thread.java:1602)
    at kotlinx.coroutines.debug.internal.DebugProbesImpl.enhanceStackTraceWithThreadDumpImpl(DebugProbesImpl.kt:339)
    at kotlinx.coroutines.debug.internal.DebugProbesImpl.dumpCoroutinesSynchronized(DebugProbesImpl.kt:294)
    at kotlinx.coroutines.debug.internal.DebugProbesImpl.dumpCoroutines(DebugProbesImpl.kt:266)
    at kotlinx.coroutines.debug.DebugProbes.dumpCoroutines(DebugProbes.kt:144)
    at kotlinx.coroutines.debug.DebugProbes.dumpCoroutines$default(DebugProbes.kt:144)
    at MainKt$main$1.invokeSuspend(Main.kt:21)
    (Coroutine creation stacktrace)
    at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
    at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:58)
    at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
    at MainKt.main(Main.kt:14)
    at MainKt.main(Main.kt)

Coroutine StandaloneCoroutine{Active}@2fd9881b, state: SUSPENDED, and
Coroutine StandaloneCoroutine{Active}@70944db3, state: SUSPENDED, and
Coroutine StandaloneCoroutine{Active}@276103c5, state: SUSPENDED
    at MainKt$main$1$1$1.invokeSuspend(Main.kt:18)
    (Coroutine creation stacktrace)
    at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
    at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
    at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:47)
    at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
    at MainKt$main$1.invokeSuspend(Main.kt:17)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

Other issues regarding the size of the dump: is it a good idea to dump some megabytes of data into the log file? Should a developer copy-paste the relevant data from the log? Is it a reasonable practice? Looking at JEP 425 is a good idea, but I cannot exclude that developer generate a dump in a separate file instead to write it into the log (like a core dump), so a JSON, YAML, or a binary exporter (JSON+GZIP, ProtoBuf+GZIP) can be useful. So my previous comment: making a dump pluggable (like serialization) should avoid putting too much logic in this library. For this use case, also integration with JFR can be helpful, but maybe this is another task.

dkhalanskyjb commented 1 year ago

From https://openjdk.org/jeps/425:

The thread dump is another popular tool for troubleshooting applications written in the thread-per-request style. Unfortunately, the JDK's traditional thread dump, obtained with jstack or jcmd, presents a flat list of threads. This is suitable for dozens or hundreds of platform threads, but is unsuitable for thousands or millions of virtual threads. Accordingly, we will not extend traditional thread dumps to include virtual threads, but will rather introduce a new kind of thread dump in jcmd to present virtual threads alongside platform threads, all grouped in a meaningful way. Richer relationships among threads can be shown when programs use structured concurrency.

Maybe it makes sense to look at what jcmd does or even repeat their format for better tooling compatibility. For example, they include the structured concurrency relationships in the output.

mvicsokolova commented 1 year ago

For now, jcmd supports only JSON as a better visualisation format for lots of virtual threads. (Check out the jcmd thread dump demo).

For example, here is the structured thread dump: root StructuredTaskScope and 2 child scopes that run 1000 virtual threads each (stacktraces and the list of threads are collapsed).

jcmd pid Thread.dump_to_file -format=json dump.json

Screenshot 2023-02-01 at 18 42 25