Dynatrace / OneAgent-SDK-for-Java

Enables custom tracing of Java applications in Dynatrace
https://www.dynatrace.com/support/help/extend-dynatrace/oneagent-sdk/what-is-oneagent-sdk/
Apache License 2.0
33 stars 9 forks source link

Please offer recommendations for using Kotlin Coroutines with InProcessLinkTracer #33

Closed checketts closed 5 months ago

checketts commented 5 months ago

Kotlin coroutines have similar need as InProcessLinkTracer, but varies just slightly since coroutines are suspended and resumed across threads. This means the start/end mechanism is slightly different.

I would like either: 1- Documentation/guidance of how to interact with the existing SDK in a way that works with coroutines or 2- Provide helper methods that can be used directly to use with coroutines.

Example coroutine context

class DynatraceCoroutineContext(
    private val oneAgentSdk: OneAgentSDK,
    private val inProcessLink: InProcessLink,
) : ThreadContextElement<InProcessLinkTracer> {

    companion object Key : CoroutineContext.Key<DynatraceCoroutineContext>

    override val key: CoroutineContext.Key<DynatraceCoroutineContext> get() = Key

    override fun updateThreadContext(context: CoroutineContext): InProcessLinkTracer {
        val trace = oneAgentSdk.traceInProcessLink(inProcessLink)
        trace.start()
        return trace
    }

    override fun restoreThreadContext(context: CoroutineContext, oldState: InProcessLinkTracer) {
        oldState.end()
    }
}

Note there is no appropriate place to catch exceptions or end timings.

Perhaps an example extension function could also explain how to leverage this?

public fun <T> CoroutineScope.asyncWithDynatrace(
    sdk: OneAgentSDK,
    context: CoroutineContext = EmptyCoroutineContext,
    start: CoroutineStart = CoroutineStart.DEFAULT,
    block: suspend CoroutineScope.() -> T
): Deferred<T> {
    val inProcessLink = sdk.createInProcessLink()
    val inProcessLinkTracer = sdk.traceInProcessLink(inProcessLink)
    inProcessLinkTracer.start()
    val dynatraceContext = DynatraceCoroutineContext(sdk, inProcessLink)
    return try {
        async(context+dynatraceContext, start, block)
    } catch (e: Exception) {
        inProcessLinkTracer.error(e);
        throw e
    } finally {
        inProcessLinkTracer.end();
    }
}

Thanks!

checketts commented 5 months ago

Even with this code in place, once the coroutine is suspended, the context is lost or not cleaned up correctly.

It appears that Kotlin coroutines will break traces when they are used in an async way.

Oberon00 commented 5 months ago

Please note this part of the README at https://github.com/Dynatrace/OneAgent-SDK-for-Java/blob/master/README.md#tracers, especially the second paragraph:

start() records the active PurePath node on the current Java thread as parent (if any; whether created by another Tracer or the OneAgent), creates a new PurePath node and sets the new one as the currently active one. The OneAgent also requires that a child node ends before all parent nodes (Stated another way, tracers on the same thread must be ended in the opposite order of how they were started. While this may sound odd if you hear it the first time, it corresponds to the most natural usage pattern and you usually don't even need to think about it).

While the tracer's automatic parent-child relationship works very intuitively in most cases, it does not work with asynchronous patterns, where the same thread handles multiple logically separate operations in an interleaved way on the same Java thread. If you need to instrument such patterns, it is recommended to use the OneAgent's [OpenTelemetry interoperability][oa-otel].

So in summary, while I didn't investigate Kotlin coroutines in particular, it's likely that they just won't work well with the SDK's API for the reasons documented above, and you should consider using the OneAgent's OpenTelemetry support instead.

There are no plans to improve async support of the OneAgent SDK, as the OpenTelemetry support of the OneAgent usually works very well as replacement for the SDK in these cases.

I hope this helps.

checketts commented 5 months ago

Thanks for the response. I've moved forward with the OpenTelemetry support and it is one step closer. However I'm still struggling to get the downstream trace to connect. (I'll post my snippets that I get working back here for other's benefit once I get the kinks ironed out)

Here is an overview of what I'm trying to connect App code -> Async Coroutines -> Spring Webflux Client -> Other REST service

The Webflux calls are not showing up. I'm using the OneAgent ability to report traces so I may be hitting the feature where OneAgent filters out WebFlux OpenTelemtry instrumetation to avoid double tracking.

So my current questions are: 1- How I can disable the duplication (or it isn't able to be disabled and I just need to use a separate OpenTelemetry collector?) 2- How can I use the Dynatrace configuration link the trace?

I realize these are more support questions than Github issues. So I'm happy to pursue that route if it is preferred.

Oberon00 commented 5 months ago

I'm glad it helped a bit.

If you run into the OneAgent filtering out WebFlux, you should instead see the WebFlux calls through the OneAgent's built-in support. This sounds rather too complex to be solved on GitHub. Please go through Dynatrace Support and provide them with the relevant links to your Dynatrace environment (e.g. a link to a trace that ought to contain the WebFlux calls but doesn't).