TechEmpower / FrameworkBenchmarks

Source for the TechEmpower Framework Benchmarks project
https://www.techempower.com/benchmarks/
Other
7.64k stars 1.94k forks source link

Update Citrine to Ubuntu 22.04 performance issues #8038

Open franz1981 opened 1 year ago

franz1981 commented 1 year ago

As commented by @nbrady-techempower in https://github.com/TechEmpower/FrameworkBenchmarks/issues/7321#issuecomment-1466365129 here we are!

I don't have a clue why/what's going on, but let's take a look at both vertx and redkale before/after the upgrade, with Plaintext:

before: image

and @redkale, last run: image

and both vert.x and redkale in another nightly with the new env: image

It appears clear to me that the new env is quite unstable for some top performers and given the discussion at https://github.com/TechEmpower/FrameworkBenchmarks/discussions/7984#discussioncomment-5282261 where RedKale isn't performing any http request header decoding and hasn't changed version across the different runs, something wrong is going on here...

franz1981 commented 1 year ago

@vietj too (same applies, although with some smaller effects to Netty): I haven't verified yet with other frameworks TBH

NateBrady23 commented 1 year ago

I'm seeing some pretty good consistency between new runs on the new environment for the top performers and spot checking some others. I'm not yet convinced it's the environment that's unstable.

I believe Redkale was the framework I saw that was changing snapshots between runs so that they didn't have to alter the version number in our repo, so frankly, I don't trust the variance we're seeing here from them.

franz1981 commented 1 year ago

And what about Vertx and Netty or Helidon Nima? These are all JVM based ones and none of them has upgraded or changed anything before/after the env change.

I am not sure that observing the very top performer is the right strategy TBH, because it depends how much they were constrained by the NIC vs CPU resources, meaning that a CPU or memory penalty will still leave them enough room to max the out the NIC and appear as they didn't received any hit. Observing other more constrained Frameworks would make more evident existing env issues, instead.

NateBrady23 commented 1 year ago

Netty and Helidon Nima both are consistent across runs in the new env. So, I don't believe the new environment is "unstable." However, it could be that some part of the update has affected the performance of some frameworks that could have been uniquely tuned to the previous environment. Do you have any insight into what that might be?

franz1981 commented 1 year ago

Sorry, with Netty I was wrong, but Helidon Nima has received an hit and Vertx the same. They didn't have any specific optimization at socket/env or OS level - no affinity (I was involved into the vertx one given that I am responsible of it with Julien Viet). Nima and Vertx are using too distant JVM versions and they are using very different Epoll optimizations under the hood, meaning that anything related networking and JVM can be excluded.

The sole thing that make them similar is that they are CPU bottlenecked (and can be observed in the dstat metrics), meaning that even a small noise there can cause perf variations.

NateBrady23 commented 1 year ago

but Helidon Nima has received an hit

But you mean from the old environment to the new environment? Because in two runs on the new environment, helidon nima is 3,584,007 and 3,556,068 which is normal variance as far as I'm concerned.

franz1981 commented 1 year ago

Yep, exactly, and for vertx the same - the old env vs new env numbers. I am still waiting vertx current nightly to see if the last new env result will repeat. Sorry if I wasn't clear enough at all to explain that was an old vs new env perf hit. And the word "noise" is indeed misleading, now that redkale has been excluded by the list of victims (that received an hit in different runs of the new env).

NateBrady23 commented 1 year ago

Ok, yes. I agree with this. It seems like there's been a small hit (some large ones) across the board for a lot of frameworks fairly consistently. I'm happy to relay some information from the environments to help people track down what that might be.

franz1981 commented 1 year ago

Many thanks! The only thing that come in my mind related both Vert-x and Nima (built on 2 different stacks) is that they are both CPU limited and are overcommiting - the former by using twice the number of event loops vs number of available cores, the other by using a rightly sized Fork Join thread pool but some additional competing and busy garbage collection threads. It makes me think that something related IRQ processing or anything else that can steal CPU cycles can hit both (and the perf hit looks similar too - ~8/9%)

franz1981 commented 1 year ago

Let me think if adding some profiling could be of any help to see what's going on or even checking how the CPU usage of top performers has changed if they were not running at 100% CPU before - and looking at https://ajdust.github.io/tfbvis/?testrun=Citrine_started2022-06-30_edd8ab2e-018b-4041-92ce-03e5317d35ea&testtype=plaintext&round=false I see that they were not maxing out CPU.

If they didn't changed CPU usage it could be some "noisy neighbor" while if they see an augmented CPU usage (it should include irq processing into it, afaik) to deliver the same performance, then profiling can be of some help, because is something visible at application level.

I am opened to ideas here :)

redkale commented 1 year ago

@franz1981 The reason for the poor performance of the redkale framework this time is that http header coding is enabled, not because of the change of the environment. HttpContent.lazyHeaders is an automatic optimization highlight. Redkale will load all RestServices and HttpServlets when the service is started. If all RestServices do not want to read cookie, sessionid, and http header parameters, and there are no HttpFilter or WebSocket, then lazyHeaders=true, http headers will not be parsed. Otherwise, http headers will be parsed。 My English is poor. The above is translated by robots. What sentences are there。

franz1981 commented 1 year ago

Thanks @redkale for the comment. I see instead that graal version is performing better so maybe there the lazy loading is stil happening?

franz1981 commented 1 year ago

@nbrady-techempower Let's try first to exclude the obvious reasons:

There is a chance that Vert-x is regressed as others by the mentioned percentage, but Netty isn't because it has balanced its regression with the improvement provided by io_uring (hence both would have regressed if they use Epoll). Netty should print which driver is using, can you grab the output of the server while running on citrine?

fakeshadow commented 1 year ago

can you grab the output of the server while running on citrine?

For completed continuous run you can check the detail where it shows you link for result where log is contained. for example this was from the last run: https://tfb-status.techempower.com/unzip/results.2023-03-16-21-13-17-497.zip/results/20230310141027/netty/run/netty.log

franz1981 commented 1 year ago

Many thanks @fakeshadow didn't know that so... Half mistery solved @nbrady-techempower We do have a regression in both Vert-x and Netty but Netty didn't seems affected because it is using io_uring under the hood now, and amortized the perf drop...

NinoFloris commented 1 year ago

I'm seeing strange things happening for fortunes as well. drogon and asp.net core consistently had about 30% more rps before the upgrade.

franz1981 commented 1 year ago

Thanks @NinoFloris I am working with my team to provide some regression analysis across latest nightly for all frameworks (will work on it next week) so we can help @nbrady-techempower to detect how many have regressed and by how much (and if there are relevant changes between the nightly)

volyrique commented 1 year ago

If you check the top performers, it is apparent that there is a performance regression in all tests except cached queries and plaintext - the latter two still being bottlenecked by the 10 Gb/s network speed. The regression is 5-7% for the JSON serialization, multiple queries, and database updates tests, and 14-17% for the single query and fortunes ones; overall, a 9% reduction in composite score.

franz1981 commented 1 year ago

@redkale Looking the last results it appears you are not performing http request headers decoding anymore again.

redkale commented 1 year ago

@redkale Looking the last results it appears you are not performing http request headers decoding anymore again.

The last time http header parsing was enabled, it was for comparison. For simple http, it is disabled by default

synopse commented 1 year ago

@redkale As far as I can understand TFB requirement number 2, the HTTP should be minimal, but realistic. In respect to this requirement, any framework not parsing the HTTP headers should be marked as "Stripped".

For instance, any minimal HTTP server should properly react to a Connection: close header, and actually close the connection. See https://github.com/TechEmpower/FrameworkBenchmarks/issues/8205