Open greatvovan opened 1 month ago
@lmolkova could you please take a look?
Thank you for reporting this @greatvovan!
I wonder if the problem could be around Scala notebook having different shutdown hooks and exiting before anything is exported.
Could you please try calling openTelemetry.shutdown()
and awaiting it to complete to eliminate this? Thank you!
Otherwise, @jeanbisutti @heyams or @trask should be able to investigate further.
@lmolkova I tried openTelemetry.shutdown()
as well as Thread.sleep(...)
at the end, but it did not change anything.
the following code snippet works in Scala under Linux, but does NOT work in a Databricks notebook/job with a compute cluster 15.4 LTS ML (includes Apache Spark 3.5.0, Scala 2.12). it seems the issue exists in the Databricks runtime
import io.opentelemetry.api.OpenTelemetry
import io.opentelemetry.api.common.Attributes
import io.opentelemetry.api.common.AttributeKey
import io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdk
import com.azure.monitor.opentelemetry.exporter.AzureMonitorExporter
object Main {
def test(aiConnStr: String): String = {
val sdkBuilder = AutoConfiguredOpenTelemetrySdk.builder()
AzureMonitorExporter.customize(sdkBuilder, aiConnStr);
val openTelemetry = sdkBuilder.build().getOpenTelemetrySdk()
println("Auto-configured " + openTelemetry.toString())
val attributes = Attributes.of[java.lang.String, java.lang.Long](
AttributeKey.stringKey("foo"), "bar",
AttributeKey.longKey("code"), 42L
)
val tracer = openTelemetry.getTracer("my-notebook")
val span = tracer.spanBuilder("My-span-ScalaEA").startSpan()
val scope = span.makeCurrent()
try {
span.addEvent("XScalaEA span event", attributes)
Thread.sleep(1000L)
}
finally {
scope.close()
span.end()
}
println("trace sent")
val meter = openTelemetry.getMeter("my-notebook")
val gauge = meter.gaugeBuilder("my-gauge-ScalaEA").build()
gauge.set(111, attributes)
println("metric sent")
openTelemetry.shutdown()
Thread.sleep(5000L)
"done"
}
def main(args: Array[String]): Unit = {
test("...")
}
}
libraryDependencies += "io.opentelemetry" % "opentelemetry-sdk" % "1.42.1",
libraryDependencies += "io.opentelemetry" % "opentelemetry-api" % "1.42.1",
libraryDependencies += "com.azure" % "azure-monitor-opentelemetry-exporter" % "1.0.0-beta.29"
Yes, that's exactly what I am saying in the initial post. I am not ready to blame the Databricks runtime, though. It should be a regular Java runtime. I'd rather blame the library. Maybe developers can help us enable logging from the library to stdout to troubleshoot it faster...
hi @greatvovan! I was able to reproduce your findings
and it took me a while, but I finally found where the logs go in azure databricks 😅
and saw this:
24/10/22 22:20:52 WARN ChannelInitializer: Failed to initialize a channel. Closing: [id: 0xec533e08]
java.lang.NoSuchMethodError: io.netty.handler.codec.http.HttpClientCodec.<init>(Lio/netty/handler/codec/http/HttpDecoderConfig;ZZ)V
so it looks like databricks is bringing their own (older) version of netty which is conflicting with the version used by Azure SDK libraries
the good news is that Azure SDK libraries support alternate http client implementations, so I think you should be able to exclude com.azure:azure-core-http-netty
(which is a transitive dependency of com.azure:azure-monitor-opentelemetry-exporter
) and include com.azure:azure-core-http-okhttp
instead
Hi @trask! Thank you for investigation, at least now we understand the reason.
In terms of fixing, how can we tell the library to stop using Netty and start using OkHttp? I installed com.azure:azure-core-http-okhttp:1.12.4
but it did nothing. I also tried installing io.netty:netty-all:4.1.114.Final
or using com.azure:azure-core-http-netty:1.15.5
as root in hope that it will replace a part of the dependency tree, but with no avail.
I think you should be able to exclude com.azure:azure-core-http-netty
I don't quite understand how I can exclude a library in Databricks, and not sure it is possible, as it may be used internally.
For better diagnostics, do you not think it would be better to check all necessary dependencies in initialization stage, e.g. in AutoConfiguredOpenTelemetrySdkBuilder.build()
and throw a clear exception rather than being failing silently?
Also, could you explain how to enable/find the logging you quoted, so that I could possibly try troubleshooting on my own?
I think you should be able to exclude com.azure:azure-core-http-netty
so if this was a maven project, here's what I would do:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-monitor-opentelemetry-exporter</artifactId>
<version>1.0.0-beta.29</version>
<exclusions>
<exclusion>
<groupId>com.azure</groupId>
<artifactId>azure-core-http-netty</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-core-http-okhttp</artifactId>
<version>1.12.4</version>
</dependency>
I didn't try it in Databricks, but it looks like they give you an option to add exclusions:
so I'd try to install com.azure:azure-monitor-opentelemetry-exporter
with exclusion ~com.azure:azure-core-http-okhttp
~ com.azure:azure-core-http-netty
and then also install com.azure:azure-core-http-okhttp
Also, could you explain how to enable/find the logging you quoted, so that I could possibly try troubleshooting on my own?
yeah, open your cluster and go to the Driver logs tab, then look at the log4j logs
@trask you saved my day! Or maybe week...
Indeed, there is an option to exclude, which worked as expected.
Now traces and metrics are coming (which is the main thing for me), but logs are not. Not sure if it's still related to library conflicts, I assume the same client is responsible for exporting all kinds of telemetry, and now I don't see such complaints in driver logs anyway.
I activate logging this way:
import org.apache.logging.log4j.{LogManager, Logger, Level}
import org.apache.logging.log4j.core.{LoggerContext}
import org.apache.logging.log4j.core.appender.ConsoleAppender
import org.apache.logging.log4j.core.appender.ConsoleAppender.{Builder => ConsoleBuilder}
import org.apache.logging.log4j.core.layout.PatternLayout
import org.apache.logging.log4j.core.config.builder.api.ConfigurationBuilderFactory
import org.apache.logging.log4j.core.config.{AppenderRef, LoggerConfig}
import io.opentelemetry.instrumentation.log4j.appender.v2_17.OpenTelemetryAppender
import io.opentelemetry.instrumentation.log4j.appender.v2_17.OpenTelemetryAppender.{Builder => OtelBuilder}
val context = LogManager.getContext(false).asInstanceOf[LoggerContext]
val config = context.getConfiguration
val layout = PatternLayout.newBuilder().withPattern("%d{yyyy-MM-dd HH:mm:ss} %-5level %logger{36} - MARKER %msg%n").build()
val consoleAppender = ConsoleAppender.newBuilder().asInstanceOf[ConsoleBuilder[_]]
.setName("MyConsoleAppender").asInstanceOf[ConsoleBuilder[_]]
.setLayout(layout).asInstanceOf[ConsoleBuilder[_]]
.setTarget(ConsoleAppender.Target.SYSTEM_OUT).asInstanceOf[ConsoleBuilder[_]]
.build()
consoleAppender.start()
val otelAppender = OpenTelemetryAppender.builder().asInstanceOf[OtelBuilder[_]]
.setName("MyOtelAppender").asInstanceOf[OtelBuilder[_]]
.setCaptureMapMessageAttributes(true).asInstanceOf[OtelBuilder[_]]
.setCaptureContextDataAttributes("*").asInstanceOf[OtelBuilder[_]]
.build()
otelAppender.start()
val rootLoggerConfig = config.getRootLogger
rootLoggerConfig.addAppender(consoleAppender, null, null)
rootLoggerConfig.addAppender(otelAppender, null, null)
context.updateLoggers()
I connect a ConsoleAppender just to have an indication in stdout that my logs are handled. Then initialize the Exporter and install the Appender.
val sdkBuilder = AutoConfiguredOpenTelemetrySdk.builder()
AzureMonitorExporter.customize(sdkBuilder, aiConnStr)
val openTelemetry = sdkBuilder.build().getOpenTelemetrySdk()
OpenTelemetryAppender.install(openTelemetry)
Cluster libraries:
Logging:
val logger = LogManager.getLogger("My-notebook")
logger.error("My test log")
I see the result in the cell output and cluster's stdout (with formatting I set), so I conclude that logging system is set up fine. Console and Otel appenders are added identically, and should emit messages simultaneously, but not a single log record is coming to the cloud. How can we further troubleshoot the issue?
I think I got it working.
Just change OpenTelemetryAppender.install(openTelemetry)
to otelAppender.setOpenTelemetry(openTelemetry)
.
It seems that OpenTelemetryAppender.install(openTelemetry)
doesn't work when configuring things dynamically (as opposed to via log4j2.xml
file).
Also, I found it helpful to add this to the top of the notebook:
Configurator.reconfigure();
Indeed, appender's setOpenTelemetry()
connected the dots! It is unbelievable, we've got from nothing to all three types of telemetry. Thanks a lot for help.
To be more accurate, though, the difference seems to be not in static vs. dynamic use, but elsewhere. The following code, if put into a standalone Scala program,
val configBuilder = ConfigurationBuilderFactory.newConfigurationBuilder()
val configuration = configBuilder
.add(
configBuilder
.newAppender("Console", "CONSOLE")
)
.add(
configBuilder
.newAppender("Otel", "OpenTelemetry")
.addAttribute("captureMapMessageAttributes", true) // Capture extra MapMessage attributes
.addAttribute("captureContextDataAttributes", "*") // Capture ThreadContext attributes
)
.add(
configBuilder
.newRootLogger(Level.INFO)
.add(configBuilder.newAppenderRef("Console"))
.add(configBuilder.newAppenderRef("Otel"))
)
.build(false)
Configurator.initialize(configuration)
val sdkBuilder = AutoConfiguredOpenTelemetrySdk.builder()
AzureMonitorExporter.customize(sdkBuilder, aiConnStr)
val openTelemetry = sdkBuilder.build().getOpenTelemetrySdk()
OpenTelemetryAppender.install(openTelemetry)
also configures logging in run time, but it is compatible with OpenTelemetryAppender.install(openTelemetry)
. Unfortunately configuration with ConfigurationBuilderFactory
does not work under Databricks, so we can't test install()
method because the problem happens in the chain before the appender. This is unfortunate also because Log4j discourages modifying configuration components directly, and my current approach (that worked) goes against declarative methods like ConfigurationBuilderFactory
.
To be more accurate, though, the difference seems to be not in static vs. dynamic use, but elsewhere
I see different behavior, you can check out my Java repro:
No, I mean, if you use ConfigurationBuilder
. Check this out:
It is still dynamic, but OpenTelemetryAppender.install()
does its work.
Not sure to which extent it is useful, but I think it's enough to say that the problem lies deeper than static vs dynamic split.
btw, not sure if this might be interesting / relevant to follow: https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/12468
@greatvovan just checking if it's ok to close this issue, or if you'd like to leave it open? thanks
My issue is resolved, but I am thinking if we can derive moral from the story. Could the library handle the problem of missing/incorrect dependency better?
Following up on this comment, wouldn't it be better to fail more loudly than posting a quiet warning? Was it the intended design to not interfere with the application in case of problems, or we get that as an undesired side effect? My thinking is that if the user added instrumentation, it become an integral part of the program and cannot be easily sacrificed.
Describe the bug When running a Scala notebook in a Databricks cluster, instrumentation using
AzureMonitorExporterBuilder
does not produce any output in the App Insights backend.Exception or Stack Trace No errors.
To Reproduce
com.azure:azure-monitor-opentelemetry-exporter:1.0.0-beta.30
in the cluster.Code Snippet
~Important: run the configuration code (till
sdkBuilder.build().getOpenTelemetrySdk()
) only once (see #41859). If you ran it and gotConfigurationException
, restart the compute cluster (Run -> Restart compute resource).~ – fixed in version 1.0.0-beta.29.The above code as a gist: https://gist.github.com/greatvovan/a2148bccf2c0e0c2305e8a35e0779dc3
Alternative (with manual configuration): https://gist.github.com/greatvovan/5e9e3d8ecaf4e210ade619c5b55455b3
Expected behavior 3 entries in
traces
table (2 logs, 1 event), 1 entry incustomMetrics
, 1 entry independencies
table (span) created in App InsightsScreenshots N/A
Additional context
Setup (please complete the following information):
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
EDIT: Updated the gists to look more like linear Scala programs. EDIT: Updated the code for version 1.0.0-beta.29 (the gists remain for version 28). EDIT: Bumped version to 1.0.0-beta.30.