OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.15k stars 592 forks source link

InstantOn Performance: First Request time with mpTelemetry #28447

Open jdmcclur opened 5 months ago

jdmcclur commented 5 months ago

While testing a microProfile application with InstantOn and the feature mpTelemetry-1.1, I see a 30% faster time to first request response when Telemetry is not enabled (otel.sdk.disabled=true).

Most of the time is coming from Open Telemetry code originating from here: https://github.com/OpenLiberty/open-liberty/blob/4d8d8a1b4338b977df1c84a73396e10609724607/dev/io.openliberty.microprofile.telemetry.1.1.internal/src/io/openliberty/microprofile/telemetry11/internal/rest/TelemetryServletFilter.java#L100

We can't get the openTelemetryInfo during the checkpoint phase due to: https://github.com/OpenLiberty/open-liberty/issues/26522

Doing this lazily at first request is adding a lot of time. I have no solutions yet, but thought it was worth documenting.

vijaysun-omr commented 5 months ago

@jdmcclur is it possible to run some part of the OpenTelemetryAccessor.getOpenTelemetryInfo() logic ahead of checkpoint ? Even if it is "partial" initialization that can be done, it is work that can then be avoided after checkpoint. In the worst case, avoiding loading of most of the classes after restore is the lowest bar to think about clearing, even if all of the logic in that getOpenTelemetryInfo needs to be repeated after restore.

jdmcclur commented 5 months ago

@Azquelt - I gathered the following with Xtrace, which is a bit tedious.

-Xtrace:maximal={mt{entry},mt{exit}},methods={*OpenTelemetryAccessor*,*OpenTelemetryInfoFactoryImpl.c*,*AutoConfiguredOpenTelemetrySdk*,*Grpc*,*okhttp3*},output=bufferOutput.trc

Most of the time is spent in/below AutoConfiguredOpenTelemetrySdkBuilder.build() with 86ms spent in GrpcExporterBuilder.build() with ~44ms of that creating the okhttp3 client?

There is also another 80ms I am not sure about yet inside AutoConfiguredOpenTelemetrySdk

Entry >io/openliberty/microprofile/telemetry/internal/interfaces/OpenTelemetryAccessor.getOpenTelemetryInfo()Lio/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfo; bytecode static method
    Entry >io/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfoFactoryImpl.createOpenTelemetryInfo()Lio/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfo; bytecode method, this = 0xf068cb98
        Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk.builder()Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder; bytecode static method
            Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.<init>()V bytecode method, this = 0xff162cc8
            Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.<init>()V bytecode method Diff = 4.58 ms
        Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk.builder()Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder; bytecode static method Diff = 5.01 ms
        Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.addResourceCustomizer(Ljava/util/function/BiFunction;)Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder; bytecode method, this = 0xff162cc8
        Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.addResourceCustomizer(Ljava/util/function/BiFunction;)Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder; bytecode method Diff = 0.20 ms
        Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.build()Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk; bytecode method, this = 0xff162cc8
            Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.mergeSdkTracerProviderConfigurer()V bytecode method, this = 0xff162cc8
            Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.mergeSdkTracerProviderConfigurer()V bytecode method Diff = 0.43 ms
            Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.getConfig()Lio/opentelemetry/sdk/autoconfigure/spi/ConfigProperties; bytecode method, this = 0xff162cc8
                Entry >io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.computeConfigProperties()Lio/opentelemetry/sdk/autoconfigure/spi/ConfigProperties; bytecode method, this = 0xff162cc8
                Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.computeConfigProperties()Lio/opentelemetry/sdk/autoconfigure/spi/ConfigProperties; bytecode method Diff = 1.32 ms
            Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.getConfig()Lio/opentelemetry/sdk/autoconfigure/spi/ConfigProperties; bytecode method Diff = 1.32 ms
            Entry >io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporter.builder()Lio/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder; bytecode static method
                Entry >io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.<init>(Ljava/lang/String;Ljava/lang/String;JLjava/net/URI;Ljava/util/function/Supplier;Ljava/lang/String;)V bytecode method, this = 0xff1da898
                Exit <io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.<init>(Ljava/lang/String;Ljava/lang/String;JLjava/net/URI;Ljava/util/function/Supplier;Ljava/lang/String;)V bytecode method Diff = 0.40 ms
                Entry >io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.<init>(Lio/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder;)V bytecode method, this = 0xff1d95f0
                    Entry >io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.<init>(Lio/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder;)V bytecode method, this = 0xff1d95f0
                    Exit <io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.<init>(Lio/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder;)V bytecode method Diff = 0.26 ms
                Exit <io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.<init>()V bytecode method Diff = 0.26 ms
            Exit <io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporter.builder()Lio/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder; bytecode static method Diff = 1.25 ms
            Entry >io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.build()Lio/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporter; bytecode method, this = 0xff1d95f0
                Entry >io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.build()Lio/opentelemetry/exporter/internal/grpc/GrpcExporter; bytecode method, this = 0xff1da898
                    Entry >io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.resolveGrpcSenderProvider()Lio/opentelemetry/exporter/internal/grpc/GrpcSenderProvider; bytecode static method
                    Exit <io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.resolveGrpcSenderProvider()Lio/opentelemetry/exporter/internal/grpc/GrpcSenderProvider; bytecode static method Diff = 0.37 ms
                    Entry >io/opentelemetry/exporter/sender/okhttp/internal/OkHttpGrpcSenderProvider.createSender(Ljava/net/URI;Ljava/lang/String;ZJLjava/util/Map;Ljava/lang/Object;Ljava/util/function/Supplier;Lio/opentelemetry/sdk/common/export/RetryPolicy;Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/X509TrustManager;)Lio/opentelemetry/exporter/internal/grpc/GrpcSender; bytecode method, this = 0xff225f08
                        Entry >io/opentelemetry/exporter/sender/okhttp/internal/OkHttpGrpcSender.<init>(Ljava/lang/String;ZJLjava/util/Map;Lio/opentelemetry/sdk/common/export/RetryPolicy;Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/X509TrustManager;)V bytecode method, this = 0xff9177c8
                            Entry >okhttp3/internal/concurrent/TaskRunner.<clinit>()V bytecode static method
                                Entry >okhttp3/internal/Util.<clinit>()V bytecode static method
                                Exit <okhttp3/internal/Util.<clinit>()V bytecode static method Diff = 19.20 ms
                                Entry >okhttp3/internal/Util.threadFactory(Ljava/lang/String;Z)Ljava/util/concurrent/ThreadFactory; bytecode static method
                                Exit <okhttp3/internal/Util.threadFactory(Ljava/lang/String;Z)Ljava/util/concurrent/ThreadFactory; bytecode static method Diff = 0.58 ms
                            Exit <okhttp3/internal/concurrent/TaskRunner.<clinit>()V bytecode static method Diff = 20.54 ms
                            Entry >okhttp3/internal/connection/RealConnectionPool.<init>(Lokhttp3/internal/concurrent/TaskRunner;IJLjava/util/concurrent/TimeUnit;)V bytecode method, this = 0xff91b468
                            Exit <okhttp3/internal/connection/RealConnectionPool.<init>(Lokhttp3/internal/concurrent/TaskRunner;IJLjava/util/concurrent/TimeUnit;)V bytecode method Diff = 0.32 ms
                            Entry >okhttp3/internal/Util.asFactory(Lokhttp3/EventListener;)Lokhttp3/EventListener$Factory; bytecode static method
                            Exit <okhttp3/internal/Util.asFactory(Lokhttp3/EventListener;)Lokhttp3/EventListener$Factory; bytecode static method Diff = 0.25 ms
                            Entry >okhttp3/Authenticator.<clinit>()V bytecode static method
                            Exit <okhttp3/Authenticator.<clinit>()V bytecode static method Diff = 0.27 ms
                            Entry >okhttp3/OkHttpClient.<clinit>()V bytecode static method
                                Entry >okhttp3/ConnectionSpec.<clinit>()V bytecode static method
                                    Entry >okhttp3/CipherSuite.<clinit>()V bytecode static method
                                    Exit <okhttp3/CipherSuite.<clinit>()V bytecode static method Diff = 0.80 ms
                                Exit <okhttp3/ConnectionSpec.<clinit>()V bytecode static method Diff = 1.26 ms
                            Exit <okhttp3/OkHttpClient.<clinit>()V bytecode static method Diff = 1.65 ms
                            Entry >okhttp3/CertificatePinner.<clinit>()V bytecode static method
                            Exit <okhttp3/CertificatePinner.<clinit>()V bytecode static method Diff = 0.96 ms
                            Entry >okhttp3/OkHttpClient.<init>(Lokhttp3/OkHttpClient$Builder;)V bytecode method, this = 0xff956548
                                Entry >okhttp3/internal/platform/Platform.<clinit>()V bytecode static method
                                    Entry >okhttp3/internal/platform/Jdk9Platform.<clinit>()V bytecode static method
                                    Exit <okhttp3/internal/platform/Jdk9Platform.<clinit>()V bytecode static method Diff = 0.48 ms
                                Exit <okhttp3/internal/platform/Platform.<clinit>()V bytecode static method Diff = 0.84 ms
                                Entry >okhttp3/internal/platform/Platform.buildCertificateChainCleaner(Ljavax/net/ssl/X509TrustManager;)Lokhttp3/internal/tls/CertificateChainCleaner; bytecode method, this = 0xff95afb8
                                    Entry >okhttp3/internal/platform/Platform.buildTrustRootIndex(Ljavax/net/ssl/X509TrustManager;)Lokhttp3/internal/tls/TrustRootIndex; bytecode method, this = 0xff95afb8
                                        Entry >okhttp3/internal/tls/BasicTrustRootIndex.<init>([Ljava/security/cert/X509Certificate;)V bytecode method, this = 0xff95e198
                                        Exit <okhttp3/internal/tls/BasicTrustRootIndex.<init>([Ljava/security/cert/X509Certificate;)V bytecode method Diff = 3.97 ms
                                    Exit <okhttp3/internal/platform/Platform.buildTrustRootIndex(Ljavax/net/ssl/X509TrustManager;)Lokhttp3/internal/tls/TrustRootIndex; bytecode method Diff = 4.07 ms
                                Exit <okhttp3/internal/platform/Platform.buildCertificateChainCleaner(Ljavax/net/ssl/X509TrustManager;)Lokhttp3/internal/tls/CertificateChainCleaner; bytecode method Diff = 4.19 ms
                            Exit <okhttp3/OkHttpClient.<init>(Lokhttp3/OkHttpClient$Builder;)V bytecode method Diff = 5.96 ms
                            Entry >okhttp3/HttpUrl.get(Ljava/lang/String;)Lokhttp3/HttpUrl; bytecode static method
                                Entry >okhttp3/internal/HostnamesKt.toCanonicalHost(Ljava/lang/String;)Ljava/lang/String; bytecode static method
                                Exit <okhttp3/internal/HostnamesKt.toCanonicalHost(Ljava/lang/String;)Ljava/lang/String; bytecode static method Diff = 9.99 ms
                            Exit <okhttp3/HttpUrl.get(Ljava/lang/String;)Lokhttp3/HttpUrl; bytecode static method Diff = 10.68 ms
                        Exit <io/opentelemetry/exporter/sender/okhttp/internal/OkHttpGrpcSender.<init>(Ljava/lang/String;ZJLjava/util/Map;Lio/opentelemetry/sdk/common/export/RetryPolicy;Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/X509TrustManager;)V bytecode method Diff = 43.51 ms
                    Exit <io/opentelemetry/exporter/sender/okhttp/internal/OkHttpGrpcSenderProvider.createSender(Ljava/net/URI;Ljava/lang/String;ZJLjava/util/Map;Ljava/lang/Object;Ljava/util/function/Supplier;Lio/opentelemetry/sdk/common/export/RetryPolicy;Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/X509TrustManager;)Lio/opentelemetry/exporter/internal/grpc/GrpcSender; bytecode method Diff = 44.10 ms
                    Entry >io/opentelemetry/exporter/internal/grpc/GrpcExporter.<init>(Ljava/lang/String;Ljava/lang/String;Lio/opentelemetry/exporter/internal/grpc/GrpcSender;Ljava/util/function/Supplier;)V bytecode method, this = 0xffa3fe38
                    Exit <io/opentelemetry/exporter/internal/grpc/GrpcExporter.<init>(Ljava/lang/String;Ljava/lang/String;Lio/opentelemetry/exporter/internal/grpc/GrpcSender;Ljava/util/function/Supplier;)V bytecode method Diff = 0.28 ms
                Exit <io/opentelemetry/exporter/internal/grpc/GrpcExporterBuilder.build()Lio/opentelemetry/exporter/internal/grpc/GrpcExporter; bytecode method Diff = 86.55 ms
            Exit <io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.build()Lio/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporter; bytecode method Diff = 86.56 ms
        Exit <io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdkBuilder.build()Lio/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk; bytecode method Diff = 167.74 ms
    Exit <io/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfoFactoryImpl.createOpenTelemetryInfo()Lio/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfo; bytecode method Diff = 174.75 ms
Exit <io/openliberty/microprofile/telemetry/internal/interfaces/OpenTelemetryAccessor.getOpenTelemetryInfo()Lio/openliberty/microprofile/telemetry/internal/common/info/OpenTelemetryInfo; bytecode static method Diff = 175.57 ms
Azquelt commented 5 months ago

My thoughts from examining the XTrace output:

I would expect that io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk.builder() could be run before checkpoint (5ms) as could all the <clinit> entries if we preloaded and initialized the classes. (~26ms)

I don't think we can do much else as we expect the configuration for the exporter to change after checkpoint.

We may be able to start this process on resume / at startup rather than doing it lazily on first request, which wouldn't reduce the time taken but may mean it's done in parallel with other things.

vijaysun-omr commented 5 months ago

Thanks, is there some way to run through a bulk of the code under OpenTelemetryAccessor.getOpenTelemetryInfo() with some "fake" or "unconfigured" state before checkpoint, and then re-run it after restore with "real" state ? This loads more of the classes related to the overall logic ahead of the checkpoint, and could have some measurable benefit. The down side is the development work to make it possible to call it "twice" in this manner.

Azquelt commented 5 months ago

We probably could, but I think we'd have to configure it to not send data anywhere and I think that would cause it to skip most of the code that we see running in the trace.

I would want to try making a list of class names to load and manually asking the classloader to load them pre-checkpoint.

benjamin-confino commented 2 months ago

My thoughts from examining the XTrace output:

I would expect that io/opentelemetry/sdk/autoconfigure/AutoConfiguredOpenTelemetrySdk.builder() could be run before checkpoint (5ms) as could all the entries if we preloaded and initialized the classes. (~26ms)

This will work fine if we're in runtime mode. However if we're in app mode we'd have to create one builder per app or share between them. If we share between them we'd have to synchronize to make sure we don't get two apps starting at once and loading different config into the builder. So I suggest we only do it for runtime mode.