apache / camel-quarkus

Apache Camel Quarkus
https://camel.apache.org
Apache License 2.0
256 stars 191 forks source link

[Camel 4.2] perf-regression: mean throughput regression appears with Camel Quarkus 3.5.0 #5452

Closed aldettinger closed 11 months ago

aldettinger commented 1 year ago

Bug description

A mean throughput performance drop has been captured on 10 minutes long runs , and it's confirmed on another machine with 1 hour long runs:

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                              
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]    |  Native req/s [%increase]  |               Status               |
-------------------------------------------------------------------------------------------------------------------------------------
                 3.4.0  |       10m  |   40631.67 req/s [+0.00%]  |   22104.40 req/s [+0.00%]  |                                OK  |
                 3.5.0  |       10m  |  27947.75 req/s [-31.22%]  |  14816.30 req/s [-32.97%]  |  Potential performance regression  |
                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                              
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]    |  Native req/s [%increase]  |               Status               |
-------------------------------------------------------------------------------------------------------------------------------------
                 3.4.0  |        1h  |   40683.59 req/s [+0.00%]  |   22635.90 req/s [+0.00%]  |                                OK  |
                 3.5.0  |        1h  |  28109.76 req/s [-30.91%]  |  15045.40 req/s [-33.53%]  |  Potential performance regression  |
aldettinger commented 1 year ago

After investigation comparing performance at specific commits, it's look like there are actually 2 regressions.

A -5% mean throughput drop was introduced when adopting Camel 4.0.1 in Camel Quarkus. A -25% mean throughput drop was introduced when adopting Camel 4.1.0 in Camel Quarkus.

So, we probably have issues impacting performance in commits:

@orpiske @jamesnetherton @davsclaus Does it ring any bell ? We would be searching for something that would impact a camel application with:

aldettinger commented 1 year ago

For the sake of not losing data, perf regression at specific commits:

https://github.com/apache/camel-quarkus/commit/d59a5da8e68dc0f6fdf62656d36a6cc2cd003f50 (Q  3.5.0)

https://github.com/apache/camel-quarkus/commit/6a63a2eb1edfb9e0056f6657824f74eba489c4a3 (Q  3.5.0.CR1)
Before 3.5.0.CR1 -> https://github.com/apache/camel-quarkus/commit/c3985e643fa7f784fe2518af405b2f1cb9a77448 -> regression with 1s runs

https://github.com/apache/camel-quarkus/commit/190cfbbf5c249592ad50a22254efe0fd41f999fa (C  4.1.0) -> -30% regression with 10m runs
https://github.com/apache/camel-quarkus/commit/8eddb64648e89bc3dee808d011226ba44110d9a9 (perf j17) -> -5%  regression with 10m runs

https://github.com/apache/camel-quarkus/commit/9a6b7b32eebe6e1ad1a50c44b93b79cff61077f0 (C  4.0.1) -> -5%  regression with 10m runs
https://github.com/apache/camel-quarkus/commit/054f5549e8059ae2d4f1bdcb34c8ecb45ad3abbb (oracle d) -> no   regression with 10m runs

https://github.com/apache/camel-quarkus/commit/e439becabd41d259800a5cb73e9c4044f6132b6e (Q  3.4.1) -> no   regression with 5m runs
https://github.com/apache/camel-quarkus/commit/944b470c99994800de2fcbe3837a9efcc8777caf (CQ 3.5.0) -> no   regression with 5m runs
orpiske commented 1 year ago

Hi, for 4.1, please can you check if it's involving type conversion? We rewrote that on 4.1.0.

For 4.0.0 to 4.0.1 there was a change in the SEDA component that impacted the performance (we had to revert one of the performance improvements due to a bug in an edge case scenario reported in the community). If that's involved in your scenario, it could be the case.

Additionally, I can take a look on Camel Core end ... but I'd need a report from the async-profiler (I can help you generate that).

aldettinger commented 1 year ago

@orpiske There is no explicit type conversion as far as I know and the regression sample app is not using seda.

The perf-regression tool can't easily be modified to use async-profiler but maybe some kind of local tweaks using it as java agent could be possible. That said, what kind of profiling would we need, for instance is the one below ok:

java
-agentpath:path-to/lib/libasyncProfiler.so=start,
event=cpu,
file=startup-cpu-profile.html,
interval=1000000,
simple
-jar target/quarkus-app/quarkus-run.jar

Looking at the git logs, there are quite some commits with impact on core. So, maybe we'll need a kind of Camel only reproducer at some points to help ensure the regression is fixed ?

orpiske commented 1 year ago

@aldettinger Yes, something like that would help. I'd recommend something like this (that's what I use for Camel Core):

-agentpath:/path/to/libasyncProfiler.so=start,ann,threads,event=cpu,file=report.html

You probably don't need to adjust the interval - I believe the default of 10ms used by the profiler is enough, but we can adjust if we don't collect enough detail.

Also, if I understand it correctly, your tests run for a very long time ... We don't need that much and, probably, around 5-10 minutes should give enough information.

Looking at the git logs, there are quite some commits with impact on core. So, maybe we'll need a kind of Camel only reproducer at some points to help ensure the regression is fixed ?

Yes. The core of Camel Core has been very much active recently 😁 I have a load generator for core but it's mostly for local components (ie.: seda, disruptor, direct, etc) and EIPs. It could be modified, but I don't have the tooling to generate external load (besides Kafka).

aldettinger commented 1 year ago

@orpiske So there we stand, the reports are attached there

And comparing flamegraphs before and after Camel 4.0.1 and 4.1.0 adoptions, both regressions are probably linked to some type conversion occurring in VertxPlatformHttpSupport.

More precisely, comparing flamegraphs between Camel Quarkus 3.4.0 vs Camel Quarkus with adoption of Camel 4.1.0, far more time is spent in VertxPlatformHttpSupport.toHttpResponse (increase from 4.08% to 37.60%).

And comparing flamegraphs between Camel Quarkus 3.4.0 vs Camel Quarkus with adoption of 4.0.1, there is also a bit more time spent in VertxPlatformHttpSupport.toHttpResponse (increase from 4.56% to 10.54%)

orpiske commented 1 year ago

Thank you! Looking at it.

orpiske commented 1 year ago

Both seem to be caused by overhead on the old and on the new type converter. I will investigate further. Thanks!

aldettinger commented 1 year ago

@orpiske The fix seems to cover both regression issues (-5% with Camel 4.0.1 and -25% with Camel 4.1.0).

In such cases, would the fix be backported to Camel 4.0.x ? 4.1.x ? Not sure how it relates to LTS.

orpiske commented 1 year ago

@orpiske The fix seems to cover both regression issues (-5% with Camel 4.0.1 and -25% with Camel 4.1.0).

In such cases, would the fix be back-ported to Camel 4.0.x ? 4.1.x ? Not sure how it relates to LTS.

The problem with both versions are of the same nature: the code not caching an expensive value. This particular fix covers only the case with Camel 4.2 (and, potentially, 4.1). However, Camel 4.1 is not an LTS version, so it won't be back-ported.

As for 4.0.x ... Because it uses the old type converter, I am not sure if I will back-port this code. The type converter is full of edge cases and pretty hard to maintain.

BTW, would it be possible to provide a new flamegraph with the fix applied please? I noticed a few other (unrelated) items that I would like to look at more closely.

aldettinger commented 1 year ago

Looks fair enough with respect to non LTS. Concerning Camel Quarkus 3.2.1 with Camel 4.0.1, it might be another beast. I'm gonna check if we are affected there and report back.

The flamegraphs produced after the fix are there

aldettinger commented 1 year ago

@orpiske Once variability is erased, the 4.0.1 performance drop might not be such a big issue.

Furthermore, if we look at it from LTS to LTS, we end up with just a small penalty:

 Camel Quarkus Throughput Performance Increase Compared to Previous Version  
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Status  |
-----------------------------------------------------------------------------
                2.16.0  |       45m  |  38276.70 req/s [+0.00%]  |      OK  |
                 3.2.1  |       45m  |  37042.86 req/s [-3.22%]  |      OK  |

Hope this could help to balance effort/reward that could be expected in this area.

At the end of the day, the issues are surfaced that way in Camel Quarkus:

aldettinger commented 11 months ago

More performance regression runs after Camel 4.2 adoption at this commit.

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
------------------------------------------------------------------------------------------------------------------------------------
                 3.4.0  |       10m  |  40429.30 req/s [+0.00%]  |   22568.32 req/s [+0.00%]  |                                OK  |
        3.6.0-SNAPSHOT  |       10m  |  39135.89 req/s [-3.20%]  |   20406.32 req/s [-9.58%]  |  Potential performance regression  |
                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
------------------------------------------------------------------------------------------------------------------------------------
                 3.4.0  |       10m  |  40514.10 req/s [+0.00%]  |   22342.60 req/s [+0.00%]  |                                OK  |
        3.6.0-SNAPSHOT  |       10m  |  39621.58 req/s [-2.20%]  |   20411.16 req/s [-8.64%]  |  Potential performance regression  |
                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
------------------------------------------------------------------------------------------------------------------------------------
                 3.4.0  |        1h  |  40315.19 req/s [+0.00%]  |   22372.71 req/s [+0.00%]  |                                OK  |
        3.6.0-SNAPSHOT  |        1h  |  40250.41 req/s [-0.16%]  |   20431.48 req/s [-8.68%]  |  Potential performance regression  |

So, the regression in JVM mode is fixed, however in native mode, we still have something. We need further investigation to determine whether:

orpiske commented 11 months ago

That's strange. That commit is on Paho (MQTT) ... maybe some other commit close to that change?

aldettinger commented 11 months ago

@orpiske The MQTT specific commit may not be related. I tested on main after Camel 4.2 upgrade, and it just happened that there were a few more commits that were there. So indeed, we'll need a bit more experiments to see if if this another regression or the same.

aldettinger commented 11 months ago

Some more performance regression and results below:

After adoption of Camel 4.2

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
                 3.4.0  |       10m  |  31773.00 req/s [+0.00%]  |   19192.41 req/s [+0.00%]  |                                OK  |
        3.6.0-SNAPSHOT  |       10m  |  31302.33 req/s [-1.48%]  |   17421.69 req/s [-9.23%]  |  Potential performance regression  |

Before adoption of Camel 4.2

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
                 3.4.0  |       10m  |   33168.74 req/s [+0.00%]  |   19273.56 req/s [+0.00%]  |                                OK  |
        3.6.0-SNAPSHOT  |       10m  |  23906.73 req/s [-27.92%]  |  13225.57 req/s [-31.38%]  |  Potential performance regression  |

After adoption of Quarkus 3.5.1

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
                 3.4.0  |       10m  |   32864.85 req/s [+0.00%]  |   19179.54 req/s [+0.00%]  |                                OK  |
        3.6.0-SNAPSHOT  |       10m  |  23442.23 req/s [-28.67%]  |  13133.58 req/s [-31.52%]  |  Potential performance regression  |

Before adoption of Quarkus 3.5.1

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
                 3.4.0  |       10m  |   33113.01 req/s [+0.00%]  |   19049.27 req/s [+0.00%]  |                                OK  |
        3.6.0-SNAPSHOT  |       10m  |  23506.36 req/s [-29.01%]  |  13163.21 req/s [-30.90%]  |  Potential performance regression  |

After adoption of Quarkus 3.5.0

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
                 3.4.0  |       10m  |   31991.57 req/s [+0.00%]  |   19149.94 req/s [+0.00%]  |                                OK  |
        3.5.0-SNAPSHOT  |       10m  |  22677.47 req/s [-29.11%]  |  13065.21 req/s [-31.77%]  |  Potential performance regression  |

Before adoption of Quarkus 3.5.0

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
                 3.4.0  |       10m  |   33543.35 req/s [+0.00%]  |   19345.61 req/s [+0.00%]  |                                OK  |
        3.5.0-SNAPSHOT  |       10m  |  23757.36 req/s [-29.17%]  |  13530.67 req/s [-30.06%]  |  Potential performance regression  |

After adoption of Camel 4.1.0

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
                 3.4.0  |       10m  |   32986.62 req/s [+0.00%]  |   19509.11 req/s [+0.00%]  |                                OK  |
        3.5.0-SNAPSHOT  |       10m  |  23802.20 req/s [-27.84%]  |  13943.07 req/s [-28.53%]  |  Potential performance regression  |

Before adoption of Camel 4.1.0

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
                 3.4.0  |       10m  |  33197.08 req/s [+0.00%]  |   19549.45 req/s [+0.00%]  |                                OK  |
        3.5.0-SNAPSHOT  |       10m  |  31194.07 req/s [-6.03%]  |   18786.79 req/s [-3.90%]  |  Potential performance regression  |

After adoption of Camel 4.0.1

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                             
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]   |  Native req/s [%increase]  |               Status               |
                 3.4.0  |       10m  |  32569.09 req/s [+0.00%]  |   19333.75 req/s [+0.00%]  |                                OK  |
        3.5.0-SNAPSHOT  |       10m  |  30570.92 req/s [-6.14%]  |   18444.16 req/s [-4.60%]  |  Potential performance regression  |

Before adoption of Camel 4.0.1

                             Camel Quarkus Throughput Performance Increase Compared to Previous Version                              
 Camel Quarkus version  |  Duration  |   JVM req/s [%increase]    |  Native req/s [%increase]  |               Status               |
-------------------------------------------------------------------------------------------------------------------------------------
                 3.4.0  |       10m  |  33067.74 req/s [+0.00%]  |   19618.32 req/s [+0.00%]  |      OK  |
        3.5.0-SNAPSHOT  |       10m  |  31582.06 req/s [-4.49%]  |   19352.30 req/s [-1.36%]  |      OK  |

So everything behaves like the regressions introduced in 4.0.1 and 4.1.0 are corrected in JVM mode but only partially in native mode for some reason. Next steps could be to try generating a flamegraph in native mode too.

aldettinger commented 11 months ago

@orpiske After multiple checks, it really looks like Camel 4.2 would drop native performance. By any chance, would we know on the top of our head whether Camel 4.2 introduced any changes that could impact performance in native mode ? Like a resource or a reflective class that would not be present on startup, and silently a kind of fallback mechanism would take over and the fallback mechanism is slower... or any behavior in the same family ?

orpiske commented 11 months ago

@aldettinger So, from the top of my head, I don't remember any that would cause this. We did have a few fixes in the type converter and producer caches, but none seemed to impact the performance significantly (for better or worse).

In fact, I am finding it very strange because on my own tests I see the trend of 4.2 usually being faster. But I see that's not the case with your tests ... and the numbers for native look awful!

Let's take a closer look at this.

aldettinger commented 11 months ago

Around "perceived" decrease, with short 10m runs, the variability is quite high. After 4.2 regression fixes, -1.48% it's not relevant and probably running 1h long scenario would show it tends toward zero.

The native -9.23% is another beast. Maybe a good occasion to experiment generating flame graphs in native mode. And also checking what version of the native builder is used, maybe there is a known bug.

orpiske commented 11 months ago

BTW, I am double checking the tests on Camel Core and I am trying to get more information from it. I hope to post some data here very soon.

aldettinger commented 11 months ago

@orpiske I nailed it as explained in #5548. It's just a flaw in the perf-regression tool. Long story short, the native images were not built with the same jdk version. After forcing the native builder image version, there is no more regression detected :partying_face:

Many thanks for helping in this investigation. I'm convinced of the positive impact. At the end of the day, it's yet another performance regression that will not be seen by users of LTS versions :clap:

jamesnetherton commented 11 months ago

Maybe I misunderstand, does that mean we have a performance regression when switching the native builder from JDK 17 to JDK 21?

orpiske commented 11 months ago

@orpiske I nailed it as explained in #5548. It's just a flaw in the perf-regression tool. Long story short, the native images were not built with the same jdk version. After forcing the native builder image version, there is no more regression detected 🥳

Many thanks for helping in this investigation. I'm convinced of the positive impact. At the end of the day, it's yet another performance regression that will not be seen by users of LTS versions 👏

Oh man, that's a relief 😌 🎉 Thanks for sharing the discovery!

I am curious about one thing, though (maybe related to what @jamesnetherton mentioned): is this regression specific to one JVM version? If yes, we may have it on Core and I may need to investigate that.

aldettinger commented 11 months ago

Maybe I misunderstand, does that mean we have a performance regression when switching the native builder from JDK 17 to JDK 21?

To be precise, the comparison occurred between:

It's like comparing eggs and chicken ( bad example maybe ;). So we should not assume any continuity in the native builder image chosen by Quarkus, it can be totally different across versions.

aldettinger commented 11 months ago

I am curious about one thing, though (maybe related to what @jamesnetherton mentioned): is this regression specific to one JVM version? If yes, we may have it on Core and I may need to investigate that.

@orpiske So no regression in Camel Quarkus, that's the beauty of having test setup when only a few things change at a time. That way we ignore JDK and Graalvm bugs that are not under our control.