fstien / ktor-opentracing-example

Example Ktor application instrumented with OpenTracing using ktor-opentracing
MIT License
5 stars 0 forks source link

Explanation / support for concurrent coroutines #1

Open legzo opened 3 years ago

legzo commented 3 years ago

Hi,

I ran into an issue when trying to run parallel tasks on the server. In the project I modified the getAll function of the client to call the U.S. Geological Survey API multiple times in parallel.

suspend fun getLatest(): Earthquake = withContext(Dispatchers.IO) {
    span("EarthquakeClient.getLatest()") {

        val earthquakes = (1..3)
            .map {  async { getAll() }  }
            .awaitAll()
            .first()

        val latest = earthquakes.first()
        setTag("location", latest.location)
        setTag("magnitude", latest.magnitude)
        setTag("timeGMT", latest.timeGMT)

        latest
    }
}

The thing is that the spans are all messed up. At least they don't show up as I expected :

Capture d’écran 2020-11-30 à 11 26 53

I expected something like that :

Capture d’écran 2020-11-30 à 11 28 24

So I looked at what is done in the OpenTracingServer feature and I thought the "problem" was all async tasks share the same Stack<Span>, and as the proceed() is wrapped in this theadLocalSpanStack context, all modifications done to the Stack is shared with other coroutines. My knowledge of coroutines is quite limited so I may have misunderstood...

What I ended up doing is :

suspend fun getLatest(): Earthquake = withContext(Dispatchers.IO + tracingContext()) {
    span("EarthquakeClient.getLatest()") {

        val earthquakes = (1..3)
            .map {  async(tracingContext()) { getAll() }  }
            .awaitAll()
            .first()

        val latest = earthquakes.first()
        setTag("location", latest.location)
        setTag("magnitude", latest.magnitude)
        setTag("timeGMT", latest.timeGMT)

        latest
    }
}

With tracingContext() defined as follows :

fun tracingContext() = threadLocalSpanStack.asContextElement(
    threadLocalSpanStack.get()?.clone() as Stack<Span>?
        ?: Stack<Span>()
)

This way, I inherit the enclosing span context, but I don't share it between concurrent tasks. It renders my spans as I expected, but then again, I might be missing something.

How would you do it ? Would you expect the same span layout from my example ?

Thanks. Julien

fstien commented 3 years ago

Hi Julien, Thanks for your message! Yes, you are correct we have a race condition with multiple coroutines adding to the same Span stack simultaneously. Your solution looks great! I look forward to your contribution to the repo. Francois