New execution mode "profiling"

otrosien commented 10 months ago

It would be extremely helpful for the framework submitters to understand the performance of their submissions, especially where the bottlenecks are when running in the target environment. For this purpose, I would propose to create a dedicated execution mode "profiling", which writes out profiling information, for example flamegraphs, either generically using perf_events, or using dedicated per-platform profiling support, like https://github.com/async-profiler/async-profiler for the JVM.

Ideally, this should be applicable holistically without requiring involvement of the individual framework contributors. Imagine the benefit of supplying flamegraphs for all applications as part of the reports on tfb-status.techempower.com - the bar for actually finding and fixing performance bottlenecks would be lowered drastically.

One option could also be providing Docker base images per platform (python, jvm etc) which have all the tools preinstalled, setting up a first baseline in alignment on runtime versions like used JDKs (see my comment in https://github.com/TechEmpower/FrameworkBenchmarks/issues/3442)

franz1981 commented 10 months ago

I strongly agree, just is not always easy due to the many different flavours of profilers per-technology. But if each group of languages vote for a specific tool, I think this to be an awesome initiative, which I can commit my help for the java side (being myself an active conteibutor/user of async profiler).

volyrique commented 10 months ago

@otrosien Since you mention tfb-status.techempower.com and it is not quite clear - are you suggesting to run all frameworks with a profiler all the time in the continuous benchmarking environment?

otrosien commented 10 months ago

I propose doing both performance and profiling runs. It may not be feasible every time but every nth round - to be figured out.

NateBrady23 commented 10 months ago

I love this idea. I'm not sure we'd have any bandwidth to work on this before the holidays, but I'll ping @msmith-techempower (currently on vacation) and see if he has any room for something like this when he gets back.

msmith-techempower commented 9 months ago

This does sound like a cool idea. We implemented the addition of some really high-level metric captures a few years ago via dstat and maybe some others (iirc). Picking actix at random, you can see that we do capture some metrics that have, so far, gone unused/unvisualized.

From a really high level, this is actually fairly straight-forward to do manually. When I was rewriting the way requests are routed on Gemini, I would simply start the application container as if it were being run by the toolset, connect to it with YourKit to capture data, then run the Docker container that does the wrk like the toolset would. This exercise worked pretty well - I could see where time was spent, how the stack would build, and how garbage collection is the bane of my existence.

Building tooling to do this generally might be complicated since each language has several flavors of profilers, but we could make it configurable at the framework level. Unclear how automated runs would do it, but maybe like you said we could split out every 10th run (or something) is a profiling run, and that might produce results like normal, but also additional artifacts from the profilers we could host on tfb-status.

NinoFloris commented 5 months ago

In light of this it would also be really helpful to get some stats on the db server, if only just cpu samples. That would be very helpful to understand whether bottlenecks are on the app or db side.

TechEmpower / FrameworkBenchmarks

New execution mode "profiling" #8556