eclipse-vertx / vertx-tracing

Vertx integration with tracing libraries
Other
42 stars 34 forks source link

Span Export fails with OpenTelemetry 1.32.0 #68

Closed Traderjoe95 closed 11 months ago

Traderjoe95 commented 1 year ago

Version

The issue is certainly present in Vert.x 4.5.0, I haven't tried to reproduce it on older versions.

Context

The OpenTelemetry Span exporters (reproduced with Jaeger gRPC and OTLP gRPC) started throwing NullPointerException after updating to OpenTelemetry 1.32.0. The exception states that the vertxCtx is null.

java.lang.NullPointerException: Cannot invoke "io.vertx.core.Context.getLocal(Object)" because "vertxCtx" is null
    at io.vertx.tracing.opentelemetry.VertxContextStorageProvider$VertxContextStorage.attach(VertxContextStorageProvider.java:27) ~[vertx-opentelemetry-4.5.0.jar:4.5.0]
    at io.vertx.tracing.opentelemetry.VertxContextStorageProvider$VertxContextStorage.attach(VertxContextStorageProvider.java:23) ~[vertx-opentelemetry-4.5.0.jar:4.5.0]
    at io.opentelemetry.context.Context.makeCurrent(Context.java:202) ~[opentelemetry-context-1.32.0.jar:1.32.0]
    at io.opentelemetry.context.Context.lambda$wrap$1(Context.java:211) ~[opentelemetry-context-1.32.0.jar:1.32.0]
    at io.opentelemetry.exporter.internal.InstrumentationUtil.suppressInstrumentation(InstrumentationUtil.java:28) ~[opentelemetry-exporter-common-1.32.0.jar:1.32.0]
    at io.opentelemetry.exporter.sender.okhttp.internal.OkHttpGrpcSender.send(OkHttpGrpcSender.java:116) ~[opentelemetry-exporter-sender-okhttp-1.32.0.jar:1.32.0]
    at io.opentelemetry.exporter.internal.grpc.GrpcExporter.export(GrpcExporter.java:61) ~[opentelemetry-exporter-common-1.32.0.jar:1.32.0]
    at io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter.export(OtlpGrpcSpanExporter.java:73) ~[opentelemetry-exporter-otlp-1.32.0.jar:1.32.0]
    at io.opentelemetry.sdk.trace.export.SimpleSpanProcessor.onEnd(SimpleSpanProcessor.java:81) ~[opentelemetry-sdk-trace-1.32.0.jar:1.32.0]
    at io.opentelemetry.sdk.trace.SdkSpan.endInternal(SdkSpan.java:451) ~[opentelemetry-sdk-trace-1.32.0.jar:1.32.0]
    at io.opentelemetry.sdk.trace.SdkSpan.end(SdkSpan.java:431) ~[opentelemetry-sdk-trace-1.32.0.jar:1.32.0]
    at io.vertx.tracing.opentelemetry.OpenTelemetryTracer.end(OpenTelemetryTracer.java:97) ~[vertx-opentelemetry-4.5.0.jar:4.5.0]
    at io.vertx.tracing.opentelemetry.OpenTelemetryTracer.receiveResponse(OpenTelemetryTracer.java:143) ~[vertx-opentelemetry-4.5.0.jar:4.5.0]
    at io.vertx.tracing.opentelemetry.OpenTelemetryTracer.receiveResponse(OpenTelemetryTracer.java:31) ~[vertx-opentelemetry-4.5.0.jar:4.5.0]
    at io.vertx.kafka.client.common.tracing.ProducerTracer$StartedSpan.finish(ProducerTracer.java:89) ~[vertx-kafka-client-4.5.0.jar:4.5.0]
    at io.vertx.kafka.client.producer.impl.KafkaWriteStreamImpl.lambda$null$2(KafkaWriteStreamImpl.java:111) ~[vertx-kafka-client-4.5.0.jar:4.5.0]
    at org.apache.kafka.clients.producer.KafkaProducer$AppendCallbacks.onCompletion(KafkaProducer.java:1505) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:270) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:234) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.ProducerBatch.complete(ProducerBatch.java:180) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:692) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:663) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:589) ~[kafka-clients-3.5.0.jar:?]
    at java.util.ArrayList.forEach(Unknown Source) ~[?:?]
    at org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:576) ~[kafka-clients-3.5.0.jar:?]
    at java.lang.Iterable.forEach(Unknown Source) ~[?:?]
    at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:576) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:850) ~[kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:154) [kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:594) [kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:586) [kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328) [kafka-clients-3.5.0.jar:?]
    at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243) [kafka-clients-3.5.0.jar:?]
    at java.lang.Thread.run(Unknown Source) [?:?]

Do you have a reproducer?

No, but I pinned down the issue pretty clearly. The stack trace above occurs on the Kafka Producer Network Thread. This thread is obviously not a Vert.x thread (also none with virtual threads support, as added in Vert.x 4.5.0), so it is not expected to have a Vert.x context. Since OpenTelemetry 1.32.0, a new utility is added, which suppresses instrumentation for certain blocks of code. This is achieved by putting a special OpenTelemetry Context into scope, which contains a flag to disable instrumentation:

https://github.com/open-telemetry/opentelemetry-java/compare/v1.31.0...v1.32.0#diff-e0f86f9c669c5dfdee741804e40a62f9b122ec756c2df1acc56a8d7a9ff3247fR28

This utility is used around the network calls of the span exporters, and thus tries to bring the special context into scope. Because there is no Vert.x context in this thread, the code at VertxContextStorageProvider.java:27 throws the NullPointerException.

As an additional remark, this could also hint at an issue in the Vert.x Kafka Client. I would expect the KafkaWriteStreamImpl to delegate back to the Event Loop right in between these stack frames:

    at io.vertx.kafka.client.producer.impl.KafkaWriteStreamImpl.lambda$null$2(KafkaWriteStreamImpl.java:111) ~[vertx-kafka-client-4.5.0.jar:4.5.0]
    at org.apache.kafka.clients.producer.KafkaProducer$AppendCallbacks.onCompletion(KafkaProducer.java:1505) ~[kafka-clients-

but that doesn't appear to happen. Would this be worth investigating?

Proposed Solution

I would suggest that VertxContextStorage delegates to OpenTelemetry's default ThreadLocalContextStorage when called on a thread where there's no Vert.x context. This would enable OpenTelemetry tracing also on non-Vert.x threads while not violating any Vert.x guarantees. What do you think?

If you agree, I am open to submit a PR.

tsegismont commented 12 months ago

Seems related to #62

tsegismont commented 11 months ago

Fixed by #72