Lucene memory segment input regression caused by JDK 23

gbanasiak commented 1 month ago

ES version: main / lucene_snapshot

JDK: openjdk version "23" 2024-09-17 OpenJDK Runtime Environment (build 23+37-2369) OpenJDK 64-Bit Server VM (build 23+37-2369, mixed mode, sharing)

OS:

bare metal, Ubuntu 18.04 LTS, kernel 5.4.0-65-generic x86_64, 8 cores, 32 GiB RAM - reproducible occasionally
GCP, Ubuntu 22.04 LTS, kernel 6.5.0-1025-gcp x86_64, 8 cores, 8 GiB RAM - reproducible every time
GCP, Ubuntu 22.04 LTS, kernel 5.15.0-1025-gcp x86_64, 8 cores, 8 GiB RAM - reproducible every time

Bug: Lucene memory segment input slows down with JDK 23. This applies to both Lucene 9.11.1 as well as most recent 9.12.0 snapshot. The regression was noticed in nightly benchmarks when switched temporarily to JDK 23 on September 4th (rolled back on September 6th).

For example nightly benchmark for Nested Rally track regressed as follows for nested-date-histo query (src) latency:

(dashboard)

Wall clock profiles collected when running this query continuously show that ES spends way more time in methods such as org.apache.lucene.store.MemorySegmentIndexInput.readLong().

JDK 22:

JDK 23:

wall-clock-profiles.zip

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search (Team:Search)

ChrisHegarty commented 1 month ago

Initial thoughts are that the bounds checks are not being elided and result in less inlining and less optimal code.

We need to reproduce this situation in an independent test case / benchmark. Preferably using jmh.

ChrisHegarty commented 1 month ago

~Bisect shows that the perf regression occurs between jdk-23+30 and jdk-23+31. Next is the check the set of commits between those tags.~

Ok, so there is a lot of variability in this benchmark 0-60% per run, normally and with JDK 22. This previously confused my testing.

JDK 23 brings a larger amount of variability ~160-180%. I'm confident now that this occurs between jdk-23+24 and jdk-23+25.

ChrisHegarty commented 1 month ago

$ git log --pretty=oneline jdk-23+24...jdk-23+25
...

ChrisHegarty commented 1 month ago

^^^ this probably a red herring. There were some change in +25 which resulted the removal of a some varhandle caching, which was later restored in +26.

ChrisHegarty commented 1 month ago

The compiler output mostly matches with that of what we see in the profiles. MethodHandle::asType is not being inline which results in the method handle not being a constant - which performs more poorly than if it were a constant.

JDK 23:

  @ 16   org.apache.lucene.util.packed.DirectReader$DirectPackedReader40::get (34 bytes)   inline (hot)
   \-> TypeProfile (11846/11846 counts) = org/apache/lucene/util/packed/DirectReader$DirectPackedReader40
   @ 14   org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl::readLong (31 bytes)   inline (hot)
     \-> TypeProfile (11494/11494 counts) = org/apache/lucene/store/MemorySegmentIndexInput$SingleSegmentImpl
      @ 8   jdk.internal.foreign.AbstractMemorySegmentImpl::get (12 bytes)   force inline by annotation
       \-> TypeProfile (11022/11022 counts) = jdk/internal/foreign/MappedMemorySegmentImpl
        @ 1   jdk.internal.foreign.layout.ValueLayouts$AbstractValueLayout::varHandle (43 bytes)   force inline by annotation
        @ 8   java.lang.invoke.VarHandleGuards::guard_LJ_J (84 bytes)   force inline by annotation
          @ 3   java.lang.invoke.IndirectVarHandle::checkAccessModeThenIsDirect (8 bytes)   force inline by annotation
            @ 2   java.lang.invoke.VarHandle::checkAccessModeThenIsDirect (29 bytes)   force inline by annotation
          @ 59   java.lang.invoke.VarHandle::getMethodHandle (41 bytes)   force inline by annotation
          @ 71   java.lang.invoke.MethodHandle::asType (32 bytes)   failed to inline: already compiled into a big method
          @ 75   java.lang.invoke.IndirectVarHandle::asDirect (5 bytes)   accessor
          @ 80   java.lang.invoke.MethodHandle::invokeBasic(LLJ)J (0 bytes)   failed to inline: receiver not constant

JDK 22:

JDK 22
   @ 16   org.apache.lucene.util.packed.DirectReader$DirectPackedReader40::get (34 bytes)   inline (hot)
    \-> TypeProfile (6998/6998 counts) = org/apache/lucene/util/packed/DirectReader$DirectPackedReader40
     @ 14   org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl::readLong (31 bytes)   inline (hot)
      \-> TypeProfile (7475/7475 counts) = org/apache/lucene/store/MemorySegmentIndexInput$SingleSegmentImpl
       @ 8   jdk.internal.foreign.AbstractMemorySegmentImpl::get (12 bytes)   force inline by annotation
        \-> TypeProfile (10270/10270 counts) = jdk/internal/foreign/MappedMemorySegmentImpl
         @ 1   jdk.internal.foreign.layout.ValueLayouts$AbstractValueLayout::varHandle (26 bytes)   force inline by annotation
         @ 8   java.lang.invoke.VarHandleGuards::guard_LJ_J (84 bytes)   force inline by annotation 
           @ 3   java.lang.invoke.VarHandle::checkAccessModeThenIsDirect (29 bytes)   force inline by annotation
           @ 46   java.lang.invoke.VarForm::getMemberName (38 bytes)   force inline by annotation
           @ 49   java.lang.invoke.VarHandleSegmentAsLongs::get (52 bytes)   force inline by annotation
             @ 14   java.lang.invoke.VarHandleSegmentAsLongs::checkAddress (21 bytes)   force inline by annotation
               @ 1   java.util.Objects::requireNonNull (14 bytes)   force inline by annotation
               @ 15   jdk.internal.foreign.AbstractMemorySegmentImpl::checkAccess (30 bytes)   force inline by annotation
                 @ 26   jdk.internal.foreign.AbstractMemorySegmentImpl::checkBounds (54 bytes)   force inline by annotation
                   @ 16   jdk.internal.util.Preconditions::checkIndex (22 bytes)   (intrinsic)
             @ 24   jdk.internal.foreign.AbstractMemorySegmentImpl::sessionImpl (5 bytes)   accessor
             @ 29   jdk.internal.foreign.NativeMemorySegmentImpl::unsafeGetBase (2 bytes)   inline (hot)
             @ 40   java.lang.invoke.VarHandleSegmentAsLongs::offsetPlain (39 bytes)   force inline by annotation
               @ 1   jdk.internal.foreign.NativeMemorySegmentImpl::unsafeGetOffset (5 bytes)   accessor
               @ 13   jdk.internal.foreign.NativeMemorySegmentImpl::maxAlignMask (2 bytes)   inline (hot)
             @ 48   jdk.internal.misc.ScopedMemoryAccess::getLongUnaligned (18 bytes)   force inline by annotation
               @ 6   jdk.internal.misc.ScopedMemoryAccess::getLongUnalignedInternal (36 bytes)   force inline by annotation
                 @ 5   jdk.internal.foreign.MemorySessionImpl::checkValidStateRaw (33 bytes)   force inline by annotation
                 @ 15   jdk.internal.misc.Unsafe::getLongUnaligned (12 bytes)   inline (hot)
                   @ 5   jdk.internal.misc.Unsafe::getLongUnaligned (173 bytes)   (intrinsic)
                   @ 8   jdk.internal.misc.Unsafe::convEndian (16 bytes)   inline (hot)
                 @ 21   java.lang.ref.Reference::reachabilityFence (1 bytes)   force inline by annotation

ChrisHegarty commented 1 month ago

Further discussion on https://mail.openjdk.org/pipermail/panama-dev/2024-September/020643.html

ChrisHegarty commented 1 month ago

JDK bug: https://bugs.openjdk.org/browse/JDK-8341127

ChrisHegarty commented 1 month ago

Using a few JIT compiler directives restores much of the performance loss that we're seeing here from the memory segment access. Specifics to follow when we settle on the actual compiler directive to include.

ChrisHegarty commented 1 month ago

Below is the output of the nested Rally track with JDK 23 along with the follow compile directives:

-XX:CompileCommand=dontinline,java/lang/invoke/MethodHandle.setAsTypeCache
-XX:CompileCommand=dontinline,java/lang/invoke/MethodHandle.asTypeUncached

The summary is that all scenarios, except the 100% percentile, show improvement anywhere from 2-10%. Though there can be some variance in this benchmark.

|                                                        Metric |                                                       Task |      Baseline |     Contender |      Diff |   Unit |    Diff % |
|--------------------------------------------------------------:|-----------------------------------------------------------:|--------------:|--------------:|----------:|-------:|----------:|
|                    Cumulative indexing time of primary shards |                                                            |   5.0136      |   4.9355      |  -0.0781  |    min |    -1.56% |
|             Min cumulative indexing time across primary shard |                                                            |   5.0136      |   4.9355      |  -0.0781  |    min |    -1.56% |
|          Median cumulative indexing time across primary shard |                                                            |   5.0136      |   4.9355      |  -0.0781  |    min |    -1.56% |
|             Max cumulative indexing time across primary shard |                                                            |   5.0136      |   4.9355      |  -0.0781  |    min |    -1.56% |
|           Cumulative indexing throttle time of primary shards |                                                            |   0           |   0           |   0       |    min |     0.00% |
|    Min cumulative indexing throttle time across primary shard |                                                            |   0           |   0           |   0       |    min |     0.00% |
| Median cumulative indexing throttle time across primary shard |                                                            |   0           |   0           |   0       |    min |     0.00% |
|    Max cumulative indexing throttle time across primary shard |                                                            |   0           |   0           |   0       |    min |     0.00% |
|                       Cumulative merge time of primary shards |                                                            |   0.424417    |   0.39735     |  -0.02707 |    min |    -6.38% |
|                      Cumulative merge count of primary shards |                                                            |   4           |   4           |   0       |        |     0.00% |
|                Min cumulative merge time across primary shard |                                                            |   0.424417    |   0.39735     |  -0.02707 |    min |    -6.38% |
|             Median cumulative merge time across primary shard |                                                            |   0.424417    |   0.39735     |  -0.02707 |    min |    -6.38% |
|                Max cumulative merge time across primary shard |                                                            |   0.424417    |   0.39735     |  -0.02707 |    min |    -6.38% |
|              Cumulative merge throttle time of primary shards |                                                            |   0.2246      |   0.200933    |  -0.02367 |    min |   -10.54% |
|       Min cumulative merge throttle time across primary shard |                                                            |   0.2246      |   0.200933    |  -0.02367 |    min |   -10.54% |
|    Median cumulative merge throttle time across primary shard |                                                            |   0.2246      |   0.200933    |  -0.02367 |    min |   -10.54% |
|       Max cumulative merge throttle time across primary shard |                                                            |   0.2246      |   0.200933    |  -0.02367 |    min |   -10.54% |
|                     Cumulative refresh time of primary shards |                                                            |   0.2824      |   0.268       |  -0.0144  |    min |    -5.10% |
|                    Cumulative refresh count of primary shards |                                                            |  20           |  20           |   0       |        |     0.00% |
|              Min cumulative refresh time across primary shard |                                                            |   0.2824      |   0.268       |  -0.0144  |    min |    -5.10% |
|           Median cumulative refresh time across primary shard |                                                            |   0.2824      |   0.268       |  -0.0144  |    min |    -5.10% |
|              Max cumulative refresh time across primary shard |                                                            |   0.2824      |   0.268       |  -0.0144  |    min |    -5.10% |
|                       Cumulative flush time of primary shards |                                                            |   0.142317    |   0.14775     |   0.00543 |    min |    +3.82% |
|                      Cumulative flush count of primary shards |                                                            |   2           |   2           |   0       |        |     0.00% |
|                Min cumulative flush time across primary shard |                                                            |   0.142317    |   0.14775     |   0.00543 |    min |    +3.82% |
|             Median cumulative flush time across primary shard |                                                            |   0.142317    |   0.14775     |   0.00543 |    min |    +3.82% |
|                Max cumulative flush time across primary shard |                                                            |   0.142317    |   0.14775     |   0.00543 |    min |    +3.82% |
|                                       Total Young Gen GC time |                                                            |   2.131       |   2.147       |   0.016   |      s |    +0.75% |
|                                      Total Young Gen GC count |                                                            |  47           |  49           |   2       |        |    +4.26% |
|                                         Total Old Gen GC time |                                                            |   0           |   0           |   0       |      s |     0.00% |
|                                        Total Old Gen GC count |                                                            |   0           |   0           |   0       |        |     0.00% |
|                                                    Store size |                                                            |   2.84918     |   2.84424     |  -0.00495 |     GB |    -0.17% |
|                                                 Translog size |                                                            |   5.12227e-08 |   5.12227e-08 |   0       |     GB |     0.00% |
|                                        Heap used for segments |                                                            |   0           |   0           |   0       |     MB |     0.00% |
|                                      Heap used for doc values |                                                            |   0           |   0           |   0       |     MB |     0.00% |
|                                           Heap used for terms |                                                            |   0           |   0           |   0       |     MB |     0.00% |
|                                           Heap used for norms |                                                            |   0           |   0           |   0       |     MB |     0.00% |
|                                          Heap used for points |                                                            |   0           |   0           |   0       |     MB |     0.00% |
|                                   Heap used for stored fields |                                                            |   0           |   0           |   0       |     MB |     0.00% |
|                                                 Segment count |                                                            |  25           |  25           |   0       |        |     0.00% |
|                                   Total Ingest Pipeline count |                                                            |   0           |   0           |   0       |        |     0.00% |
|                                    Total Ingest Pipeline time |                                                            |   0           |   0           |   0       |     ms |     0.00% |
|                                  Total Ingest Pipeline failed |                                                            |   0           |   0           |   0       |        |     0.00% |
|                                                    error rate |                                               index-append |   0           |   0           |   0       |      % |     0.00% |
|                                                Min Throughput |                                  randomized-nested-queries |  19.9367      |  19.9357      |  -0.00098 |  ops/s |    -0.00% |
|                                               Mean Throughput |                                  randomized-nested-queries |  19.9646      |  19.9643      |  -0.00034 |  ops/s |    -0.00% |
|                                             Median Throughput |                                  randomized-nested-queries |  19.9678      |  19.9675      |  -0.00025 |  ops/s |    -0.00% |
|                                                Max Throughput |                                  randomized-nested-queries |  19.9784      |  19.978       |  -0.00039 |  ops/s |    -0.00% |
|                                       50th percentile latency |                                  randomized-nested-queries |   9.20467     |   8.99144     |  -0.21323 |     ms |    -2.32% |
|                                       90th percentile latency |                                  randomized-nested-queries |  13.5691      |  12.7468      |  -0.82222 |     ms |    -6.06% |
|                                       99th percentile latency |                                  randomized-nested-queries |  16.4336      |  15.1094      |  -1.3242  |     ms |    -8.06% |
|                                     99.9th percentile latency |                                  randomized-nested-queries |  30.5139      |  16.4714      | -14.0425  |     ms |   -46.02% |
|                                      100th percentile latency |                                  randomized-nested-queries |  41.8193      |  18.1637      | -23.6556  |     ms |   -56.57% |
|                                  50th percentile service time |                                  randomized-nested-queries |   8.29052     |   8.075       |  -0.21552 |     ms |    -2.60% |
|                                  90th percentile service time |                                  randomized-nested-queries |  12.5849      |  11.6764      |  -0.90847 |     ms |    -7.22% |
|                                  99th percentile service time |                                  randomized-nested-queries |  15.3765      |  14.0032      |  -1.37333 |     ms |    -8.93% |
|                                99.9th percentile service time |                                  randomized-nested-queries |  29.3269      |  15.3982      | -13.9288  |     ms |   -47.49% |
|                                 100th percentile service time |                                  randomized-nested-queries |  40.6813      |  16.7806      | -23.9006  |     ms |   -58.75% |
|                                                    error rate |                                  randomized-nested-queries |   0           |   0           |   0       |      % |     0.00% |
|                                                Min Throughput |                                    randomized-term-queries |  25.0137      |  25.0137      |   4e-05   |  ops/s |     0.00% |
|                                               Mean Throughput |                                    randomized-term-queries |  25.0161      |  25.0159      |  -0.00019 |  ops/s |    -0.00% |
|                                             Median Throughput |                                    randomized-term-queries |  25.0159      |  25.0157      |  -0.00014 |  ops/s |    -0.00% |
|                                                Max Throughput |                                    randomized-term-queries |  25.0188      |  25.0187      |  -9e-05   |  ops/s |    -0.00% |
|                                       50th percentile latency |                                    randomized-term-queries |   2.65855     |   2.74416     |   0.0856  |     ms |    +3.22% |
|                                       90th percentile latency |                                    randomized-term-queries |   3.05292     |   3.19991     |   0.14699 |     ms |    +4.81% |
|                                       99th percentile latency |                                    randomized-term-queries |   3.41249     |   3.91761     |   0.50512 |     ms |   +14.80% |
|                                      100th percentile latency |                                    randomized-term-queries |   3.85166     |  59.0052      |  55.1536  |     ms | +1431.94% |
|                                  50th percentile service time |                                    randomized-term-queries |   1.75532     |   1.87757     |   0.12226 |     ms |    +6.96% |
|                                  90th percentile service time |                                    randomized-term-queries |   1.97695     |   2.02421     |   0.04726 |     ms |    +2.39% |
|                                  99th percentile service time |                                    randomized-term-queries |   2.24446     |   2.32631     |   0.08185 |     ms |    +3.65% |
|                                 100th percentile service time |                                    randomized-term-queries |   2.75224     |  57.9596      |  55.2074  |     ms | +2005.91% |
|                                                    error rate |                                    randomized-term-queries |   0           |   0           |   0       |      % |     0.00% |
|                                                Min Throughput |                             randomized-sorted-term-queries |  15.9873      |  15.99        |   0.00271 |  ops/s |    +0.02% |
|                                               Mean Throughput |                             randomized-sorted-term-queries |  15.9892      |  15.9915      |   0.00226 |  ops/s |    +0.01% |
|                                             Median Throughput |                             randomized-sorted-term-queries |  15.9894      |  15.9915      |   0.00206 |  ops/s |    +0.01% |
|                                                Max Throughput |                             randomized-sorted-term-queries |  15.9908      |  15.9927      |   0.00199 |  ops/s |    +0.01% |
|                                       50th percentile latency |                             randomized-sorted-term-queries |  44.0784      |  44.2931      |   0.21477 |     ms |    +0.49% |
|                                       90th percentile latency |                             randomized-sorted-term-queries |  77.8522      |  78.971       |   1.11877 |     ms |    +1.44% |
|                                       99th percentile latency |                             randomized-sorted-term-queries | 104.355       | 111.916       |   7.56151 |     ms |    +7.25% |
|                                      100th percentile latency |                             randomized-sorted-term-queries | 136.932       | 129.952       |  -6.97986 |     ms |    -5.10% |
|                                  50th percentile service time |                             randomized-sorted-term-queries |  43.0512      |  42.9996      |  -0.05158 |     ms |    -0.12% |
|                                  90th percentile service time |                             randomized-sorted-term-queries |  77.1244      |  78.3077      |   1.18329 |     ms |    +1.53% |
|                                  99th percentile service time |                             randomized-sorted-term-queries | 103.37        | 110.79        |   7.42029 |     ms |    +7.18% |
|                                 100th percentile service time |                             randomized-sorted-term-queries | 135.008       | 129.092       |  -5.91573 |     ms |    -4.38% |
|                                                    error rate |                             randomized-sorted-term-queries |   0           |   0           |   0       |      % |     0.00% |
|                                                Min Throughput |                                                  match-all |   5.00344     |   5.00339     |  -6e-05   |  ops/s |    -0.00% |
|                                               Mean Throughput |                                                  match-all |   5.00404     |   5.00398     |  -7e-05   |  ops/s |    -0.00% |
|                                             Median Throughput |                                                  match-all |   5.00401     |   5.00394     |  -7e-05   |  ops/s |    -0.00% |
|                                                Max Throughput |                                                  match-all |   5.00479     |   5.00473     |  -6e-05   |  ops/s |    -0.00% |
|                                       50th percentile latency |                                                  match-all |   2.96392     |   2.87872     |  -0.08521 |     ms |    -2.87% |
|                                       90th percentile latency |                                                  match-all |   3.36409     |   3.29008     |  -0.07401 |     ms |    -2.20% |
|                                       99th percentile latency |                                                  match-all |   3.79417     |   3.55432     |  -0.23985 |     ms |    -6.32% |
|                                      100th percentile latency |                                                  match-all |   5.74625     |   5.81256     |   0.06631 |     ms |    +1.15% |
|                                  50th percentile service time |                                                  match-all |   1.78269     |   1.69125     |  -0.09144 |     ms |    -5.13% |
|                                  90th percentile service time |                                                  match-all |   1.96422     |   1.93127     |  -0.03296 |     ms |    -1.68% |
|                                  99th percentile service time |                                                  match-all |   2.10423     |   2.1777      |   0.07347 |     ms |    +3.49% |
|                                 100th percentile service time |                                                  match-all |   4.14652     |   4.43299     |   0.28647 |     ms |    +6.91% |
|                                                    error rate |                                                  match-all |   0           |   0           |   0       |      % |     0.00% |
|                                                Min Throughput |                                          nested-date-histo |   1.00101     |   1.00101     |   0       |  ops/s |     0.00% |
|                                               Mean Throughput |                                          nested-date-histo |   1.00167     |   1.00167     |   0       |  ops/s |     0.00% |
|                                             Median Throughput |                                          nested-date-histo |   1.00152     |   1.00152     |   0       |  ops/s |     0.00% |
|                                                Max Throughput |                                          nested-date-histo |   1.00303     |   1.00303     |  -0       |  ops/s |    -0.00% |
|                                       50th percentile latency |                                          nested-date-histo | 282.853       | 270.184       | -12.6691  |     ms |    -4.48% |
|                                       90th percentile latency |                                          nested-date-histo | 311.829       | 292.716       | -19.1128  |     ms |    -6.13% |
|                                       99th percentile latency |                                          nested-date-histo | 336.068       | 314.334       | -21.7337  |     ms |    -6.47% |
|                                      100th percentile latency |                                          nested-date-histo | 346.683       | 329.301       | -17.3822  |     ms |    -5.01% |
|                                  50th percentile service time |                                          nested-date-histo | 281.092       | 267.599       | -13.4937  |     ms |    -4.80% |
|                                  90th percentile service time |                                          nested-date-histo | 310.248       | 290.134       | -20.1146  |     ms |    -6.48% |
|                                  99th percentile service time |                                          nested-date-histo | 333.707       | 312.313       | -21.3942  |     ms |    -6.41% |
|                                 100th percentile service time |                                          nested-date-histo | 343.8         | 326.279       | -17.5209  |     ms |    -5.10% |
|                                                    error rate |                                          nested-date-histo |   0           |   0           |   0       |      % |     0.00% |
|                                                Min Throughput |          randomized-nested-queries-with-inner-hits_default |  18.0011      |  18.002       |   0.00093 |  ops/s |     0.01% |
|                                               Mean Throughput |          randomized-nested-queries-with-inner-hits_default |  18.0018      |  18.0033      |   0.00151 |  ops/s |     0.01% |
|                                             Median Throughput |          randomized-nested-queries-with-inner-hits_default |  18.0016      |  18.003       |   0.00136 |  ops/s |     0.01% |
|                                                Max Throughput |          randomized-nested-queries-with-inner-hits_default |  18.0033      |  18.006       |   0.00266 |  ops/s |    +0.01% |
|                                       50th percentile latency |          randomized-nested-queries-with-inner-hits_default |   9.33778     |   9.16204     |  -0.17574 |     ms |    -1.88% |
|                                       90th percentile latency |          randomized-nested-queries-with-inner-hits_default |  13.779       |  12.7832      |  -0.99575 |     ms |    -7.23% |
|                                       99th percentile latency |          randomized-nested-queries-with-inner-hits_default |  16.5619      |  15.1976      |  -1.36429 |     ms |    -8.24% |
|                                     99.9th percentile latency |          randomized-nested-queries-with-inner-hits_default |  21.0171      |  18.444       |  -2.57306 |     ms |   -12.24% |
|                                      100th percentile latency |          randomized-nested-queries-with-inner-hits_default |  38.5041      |  53.1761      |  14.672   |     ms |   +38.10% |
|                                  50th percentile service time |          randomized-nested-queries-with-inner-hits_default |   8.1312      |   8.00942     |  -0.12179 |     ms |    -1.50% |
|                                  90th percentile service time |          randomized-nested-queries-with-inner-hits_default |  12.4433      |  11.3684      |  -1.07485 |     ms |    -8.64% |
|                                  99th percentile service time |          randomized-nested-queries-with-inner-hits_default |  14.9342      |  13.8073      |  -1.12687 |     ms |    -7.55% |
|                                99.9th percentile service time |          randomized-nested-queries-with-inner-hits_default |  20.3188      |  17.5598      |  -2.75896 |     ms |   -13.58% |
|                                 100th percentile service time |          randomized-nested-queries-with-inner-hits_default |  36.7036      |  50.9294      |  14.2258  |     ms |   +38.76% |
|                                                    error rate |          randomized-nested-queries-with-inner-hits_default |   0           |   0           |   0       |      % |     0.00% |
|                                                Min Throughput | randomized-nested-queries-with-inner-hits_default_big_size |  16.0036      |  16.0036      |  -4e-05   |  ops/s |    -0.00% |
|                                               Mean Throughput | randomized-nested-queries-with-inner-hits_default_big_size |  16.0059      |  16.0059      |   1e-05   |  ops/s |     0.00% |
|                                             Median Throughput | randomized-nested-queries-with-inner-hits_default_big_size |  16.0054      |  16.0054      |  -3e-05   |  ops/s |    -0.00% |
|                                                Max Throughput | randomized-nested-queries-with-inner-hits_default_big_size |  16.0107      |  16.0107      |  -1e-05   |  ops/s |    -0.00% |
|                                       50th percentile latency | randomized-nested-queries-with-inner-hits_default_big_size |  14.6701      |  14.1773      |  -0.49279 |     ms |    -3.36% |
|                                       90th percentile latency | randomized-nested-queries-with-inner-hits_default_big_size |  18.7551      |  17.6513      |  -1.10384 |     ms |    -5.89% |
|                                       99th percentile latency | randomized-nested-queries-with-inner-hits_default_big_size |  21.688       |  20.1119      |  -1.57607 |     ms |    -7.27% |
|                                     99.9th percentile latency | randomized-nested-queries-with-inner-hits_default_big_size |  27.0588      |  24.9349      |  -2.12389 |     ms |    -7.85% |
|                                      100th percentile latency | randomized-nested-queries-with-inner-hits_default_big_size |  27.8401      |  27.702       |  -0.1381  |     ms |    -0.50% |
|                                  50th percentile service time | randomized-nested-queries-with-inner-hits_default_big_size |  13.7254      |  13.3343      |  -0.39111 |     ms |    -2.85% |
|                                  90th percentile service time | randomized-nested-queries-with-inner-hits_default_big_size |  17.7411      |  16.6371      |  -1.10401 |     ms |    -6.22% |
|                                  99th percentile service time | randomized-nested-queries-with-inner-hits_default_big_size |  20.5643      |  19.1875      |  -1.37689 |     ms |    -6.70% |
|                                99.9th percentile service time | randomized-nested-queries-with-inner-hits_default_big_size |  26.3849      |  24.3221      |  -2.06277 |     ms |    -7.82% |
|                                 100th percentile service time | randomized-nested-queries-with-inner-hits_default_big_size |  26.6912      |  27.2608      |   0.56967 |     ms |    +2.13% |
|                                                    error rate | randomized-nested-queries-with-inner-hits_default_big_size |   0           |   0           |   0       |      % |     0.00% |

mcimadamore commented 1 month ago

@ChrisHegarty I'm assuming the 100th percentile have to do with startup. If that's the case, it makes sense that Java 23 has regressed a bit (as more code needs to be executed). We took care of this in Java 24 (by getting rid of a lot of redundant var handle adapters) - so I'd expect these 100th percentile numbers to improve there.

ChrisHegarty commented 1 month ago

Thanks @mcimadamore.

elastic / elasticsearch

Lucene memory segment input regression caused by JDK 23 #113030