Open jdmcclur opened 5 months ago
@jdmcclur is it possible to run some part of the OpenTelemetryAccessor.getOpenTelemetryInfo()
logic ahead of checkpoint ? Even if it is "partial" initialization that can be done, it is work that can then be avoided after checkpoint. In the worst case, avoiding loading of most of the classes after restore is the lowest bar to think about clearing, even if all of the logic in that getOpenTelemetryInfo
needs to be repeated after restore.
@Azquelt - I gathered the following with Xtrace, which is a bit tedious.
-Xtrace:maximal={mt{entry},mt{exit}},methods={*OpenTelemetryAccessor*,*OpenTelemetryInfoFactoryImpl.c*,*AutoConfiguredOpenTelemetrySdk*,*Grpc*,*okhttp3*},output=bufferOutput.trc
Most of the time is spent in/below AutoConfiguredOpenTelemetrySdkBuilder.build()
with 86ms spent in GrpcExporterBuilder.build()
with ~44ms of that creating the okhttp3 client?
There is also another 80ms I am not sure about yet inside AutoConfiguredOpenTelemetrySdk
Entry >io/openliberty/microprofile/telemetry/internal/interfaces/OpenTelemetryAccessor.getOpenTelemetryInfo()Lio/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfo; bytecode static method
Entry >io/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfoFactoryImpl.createOpenTelemetryInfo()Lio/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfo; bytecode method, this = 0xf068cb98
Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk.builder()Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder; bytecode static method
Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.<init>()V bytecode method, this = 0xff162cc8
Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.<init>()V bytecode method Diff = 4.58 ms
Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk.builder()Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder; bytecode static method Diff = 5.01 ms
Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.addResourceCustomizer(Ljava/util/function/BiFunction;)Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder; bytecode method, this = 0xff162cc8
Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.addResourceCustomizer(Ljava/util/function/BiFunction;)Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder; bytecode method Diff = 0.20 ms
Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.build()Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk; bytecode method, this = 0xff162cc8
Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.mergeSdkTracerProviderConfigurer()V bytecode method, this = 0xff162cc8
Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.mergeSdkTracerProviderConfigurer()V bytecode method Diff = 0.43 ms
Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.getConfig()Lio/opentelemetry/sdk/autoconfigure/spi/ConfigProperties; bytecode method, this = 0xff162cc8
Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.computeConfigProperties()Lio/opentelemetry/sdk/autoconfigure/spi/ConfigProperties; bytecode method, this = 0xff162cc8
Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.computeConfigProperties()Lio/opentelemetry/sdk/autoconfigure/spi/ConfigProperties; bytecode method Diff = 1.32 ms
Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.getConfig()Lio/opentelemetry/sdk/autoconfigure/spi/ConfigProperties; bytecode method Diff = 1.32 ms
Entry >io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporter.builder()Lio/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder; bytecode static method
Entry >io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.<init>(Ljava/lang/String;Ljava/lang/String;JLjava/net/URI;Ljava/util/function/Supplier;Ljava/lang/String;)V bytecode method, this = 0xff1da898
Exit <io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.<init>(Ljava/lang/String;Ljava/lang/String;JLjava/net/URI;Ljava/util/function/Supplier;Ljava/lang/String;)V bytecode method Diff = 0.40 ms
Entry >io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.<init>(Lio/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder;)V bytecode method, this = 0xff1d95f0
Entry >io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.<init>(Lio/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder;)V bytecode method, this = 0xff1d95f0
Exit <io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.<init>(Lio/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder;)V bytecode method Diff = 0.26 ms
Exit <io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.<init>()V bytecode method Diff = 0.26 ms
Exit <io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporter.builder()Lio/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder; bytecode static method Diff = 1.25 ms
Entry >io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.build()Lio/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporter; bytecode method, this = 0xff1d95f0
Entry >io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.build()Lio/opentelemetry/exporter/internal/grpc/GrpcExporter; bytecode method, this = 0xff1da898
Entry >io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.resolveGrpcSenderProvider()Lio/opentelemetry/exporter/internal/grpc/GrpcSenderProvider; bytecode static method
Exit <io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.resolveGrpcSenderProvider()Lio/opentelemetry/exporter/internal/grpc/GrpcSenderProvider; bytecode static method Diff = 0.37 ms
Entry >io/opentelemetry/exporter/sender/okhttp/internal/OkHttpGrpcSenderProvider.createSender(Ljava/net/URI;Ljava/lang/String;ZJLjava/util/Map;Ljava/lang/Object;Ljava/util/function/Supplier;Lio/opentelemetry/sdk/common/export/RetryPolicy;Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/X509TrustManager;)Lio/opentelemetry/exporter/internal/grpc/GrpcSender; bytecode method, this = 0xff225f08
Entry >io/opentelemetry/exporter/sender/okhttp/internal/OkHttpGrpcSender.<init>(Ljava/lang/String;ZJLjava/util/Map;Lio/opentelemetry/sdk/common/export/RetryPolicy;Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/X509TrustManager;)V bytecode method, this = 0xff9177c8
Entry >okhttp3/internal/concurrent/TaskRunner.<clinit>()V bytecode static method
Entry >okhttp3/internal/Util.<clinit>()V bytecode static method
Exit <okhttp3/internal/Util.<clinit>()V bytecode static method Diff = 19.20 ms
Entry >okhttp3/internal/Util.threadFactory(Ljava/lang/String;Z)Ljava/util/concurrent/ThreadFactory; bytecode static method
Exit <okhttp3/internal/Util.threadFactory(Ljava/lang/String;Z)Ljava/util/concurrent/ThreadFactory; bytecode static method Diff = 0.58 ms
Exit <okhttp3/internal/concurrent/TaskRunner.<clinit>()V bytecode static method Diff = 20.54 ms
Entry >okhttp3/internal/connection/RealConnectionPool.<init>(Lokhttp3/internal/concurrent/TaskRunner;IJLjava/util/concurrent/TimeUnit;)V bytecode method, this = 0xff91b468
Exit <okhttp3/internal/connection/RealConnectionPool.<init>(Lokhttp3/internal/concurrent/TaskRunner;IJLjava/util/concurrent/TimeUnit;)V bytecode method Diff = 0.32 ms
Entry >okhttp3/internal/Util.asFactory(Lokhttp3/EventListener;)Lokhttp3/EventListener$Factory; bytecode static method
Exit <okhttp3/internal/Util.asFactory(Lokhttp3/EventListener;)Lokhttp3/EventListener$Factory; bytecode static method Diff = 0.25 ms
Entry >okhttp3/Authenticator.<clinit>()V bytecode static method
Exit <okhttp3/Authenticator.<clinit>()V bytecode static method Diff = 0.27 ms
Entry >okhttp3/OkHttpClient.<clinit>()V bytecode static method
Entry >okhttp3/ConnectionSpec.<clinit>()V bytecode static method
Entry >okhttp3/CipherSuite.<clinit>()V bytecode static method
Exit <okhttp3/CipherSuite.<clinit>()V bytecode static method Diff = 0.80 ms
Exit <okhttp3/ConnectionSpec.<clinit>()V bytecode static method Diff = 1.26 ms
Exit <okhttp3/OkHttpClient.<clinit>()V bytecode static method Diff = 1.65 ms
Entry >okhttp3/CertificatePinner.<clinit>()V bytecode static method
Exit <okhttp3/CertificatePinner.<clinit>()V bytecode static method Diff = 0.96 ms
Entry >okhttp3/OkHttpClient.<init>(Lokhttp3/OkHttpClient$Builder;)V bytecode method, this = 0xff956548
Entry >okhttp3/internal/platform/Platform.<clinit>()V bytecode static method
Entry >okhttp3/internal/platform/Jdk9Platform.<clinit>()V bytecode static method
Exit <okhttp3/internal/platform/Jdk9Platform.<clinit>()V bytecode static method Diff = 0.48 ms
Exit <okhttp3/internal/platform/Platform.<clinit>()V bytecode static method Diff = 0.84 ms
Entry >okhttp3/internal/platform/Platform.buildCertificateChainCleaner(Ljavax/net/ssl/X509TrustManager;)Lokhttp3/internal/tls/CertificateChainCleaner; bytecode method, this = 0xff95afb8
Entry >okhttp3/internal/platform/Platform.buildTrustRootIndex(Ljavax/net/ssl/X509TrustManager;)Lokhttp3/internal/tls/TrustRootIndex; bytecode method, this = 0xff95afb8
Entry >okhttp3/internal/tls/BasicTrustRootIndex.<init>([Ljava/security/cert/X509Certificate;)V bytecode method, this = 0xff95e198
Exit <okhttp3/internal/tls/BasicTrustRootIndex.<init>([Ljava/security/cert/X509Certificate;)V bytecode method Diff = 3.97 ms
Exit <okhttp3/internal/platform/Platform.buildTrustRootIndex(Ljavax/net/ssl/X509TrustManager;)Lokhttp3/internal/tls/TrustRootIndex; bytecode method Diff = 4.07 ms
Exit <okhttp3/internal/platform/Platform.buildCertificateChainCleaner(Ljavax/net/ssl/X509TrustManager;)Lokhttp3/internal/tls/CertificateChainCleaner; bytecode method Diff = 4.19 ms
Exit <okhttp3/OkHttpClient.<init>(Lokhttp3/OkHttpClient$Builder;)V bytecode method Diff = 5.96 ms
Entry >okhttp3/HttpUrl.get(Ljava/lang/String;)Lokhttp3/HttpUrl; bytecode static method
Entry >okhttp3/internal/HostnamesKt.toCanonicalHost(Ljava/lang/String;)Ljava/lang/String; bytecode static method
Exit <okhttp3/internal/HostnamesKt.toCanonicalHost(Ljava/lang/String;)Ljava/lang/String; bytecode static method Diff = 9.99 ms
Exit <okhttp3/HttpUrl.get(Ljava/lang/String;)Lokhttp3/HttpUrl; bytecode static method Diff = 10.68 ms
Exit <io/opentelemetry/exporter/sender/okhttp/internal/OkHttpGrpcSender.<init>(Ljava/lang/String;ZJLjava/util/Map;Lio/opentelemetry/sdk/common/export/RetryPolicy;Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/X509TrustManager;)V bytecode method Diff = 43.51 ms
Exit <io/opentelemetry/exporter/sender/okhttp/internal/OkHttpGrpcSenderProvider.createSender(Ljava/net/URI;Ljava/lang/String;ZJLjava/util/Map;Ljava/lang/Object;Ljava/util/function/Supplier;Lio/opentelemetry/sdk/common/export/RetryPolicy;Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/X509TrustManager;)Lio/opentelemetry/exporter/internal/grpc/GrpcSender; bytecode method Diff = 44.10 ms
Entry >io/opentelemetry/exporter/internal/grpc/GrpcExporter.<init>(Ljava/lang/String;Ljava/lang/String;Lio/opentelemetry/exporter/internal/grpc/GrpcSender;Ljava/util/function/Supplier;)V bytecode method, this = 0xffa3fe38
Exit <io/opentelemetry/exporter/internal/grpc/GrpcExporter.<init>(Ljava/lang/String;Ljava/lang/String;Lio/opentelemetry/exporter/internal/grpc/GrpcSender;Ljava/util/function/Supplier;)V bytecode method Diff = 0.28 ms
Exit <io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.build()Lio/opentelemetry/exporter/internal/grpc/GrpcExporter; bytecode method Diff = 86.55 ms
Exit <io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.build()Lio/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporter; bytecode method Diff = 86.56 ms
Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.build()Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk; bytecode method Diff = 167.74 ms
Exit <io/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfoFactoryImpl.createOpenTelemetryInfo()Lio/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfo; bytecode method Diff = 174.75 ms
Exit <io/openliberty/microprofile/telemetry/internal/interfaces/OpenTelemetryAccessor.getOpenTelemetryInfo()Lio/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfo; bytecode static method Diff = 175.57 ms
My thoughts from examining the XTrace output:
I would expect that io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk.builder()
could be run before checkpoint (5ms) as could all the <clinit>
entries if we preloaded and initialized the classes. (~26ms)
I don't think we can do much else as we expect the configuration for the exporter to change after checkpoint.
We may be able to start this process on resume / at startup rather than doing it lazily on first request, which wouldn't reduce the time taken but may mean it's done in parallel with other things.
Thanks, is there some way to run through a bulk of the code under OpenTelemetryAccessor.getOpenTelemetryInfo()
with some "fake" or "unconfigured" state before checkpoint, and then re-run it after restore with "real" state ? This loads more of the classes related to the overall logic ahead of the checkpoint, and could have some measurable benefit. The down side is the development work to make it possible to call it "twice" in this manner.
We probably could, but I think we'd have to configure it to not send data anywhere and I think that would cause it to skip most of the code that we see running in the trace.
I would want to try making a list of class names to load and manually asking the classloader to load them pre-checkpoint.
My thoughts from examining the XTrace output:
I would expect that io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk.builder() could be run before checkpoint (5ms) as could all the
entries if we preloaded and initialized the classes. (~26ms)
This will work fine if we're in runtime mode. However if we're in app mode we'd have to create one builder per app or share between them. If we share between them we'd have to synchronize to make sure we don't get two apps starting at once and loading different config into the builder. So I suggest we only do it for runtime mode.
While testing a microProfile application with InstantOn and the feature mpTelemetry-1.1, I see a 30% faster time to first request response when Telemetry is not enabled (
otel.sdk.disabled=true
).Most of the time is coming from Open Telemetry code originating from here: https://github.com/OpenLiberty/open-liberty/blob/4d8d8a1b4338b977df1c84a73396e10609724607/dev/io.openliberty.microprofile.telemetry.1.1.internal/src/io/openliberty/microprofile/telemetry11/internal/rest/TelemetryServletFilter.java#L100
We can't get the openTelemetryInfo during the checkpoint phase due to: https://github.com/OpenLiberty/open-liberty/issues/26522
Doing this lazily at first request is adding a lot of time. I have no solutions yet, but thought it was worth documenting.