Randomize Framework order in runs

TechEmpower / FrameworkBenchmarks

Source for the TechEmpower Framework Benchmarks project

https://www.techempower.com/benchmarks/

Other

7.4k stars 1.89k forks source link

Randomize Framework order in runs #8830

Open p8 opened 1 month ago

p8 commented 1 month ago

Currently the benchmarks are run in the same order everytime. Sometimes a run fails after a number of frameworks were benchmarked, or the run is restarted. This causes the frameworks starting with a to have more test runs than frameworks starting with z. If the order could be randomized the number of runs would be better distributed.

fakeshadow commented 1 month ago

As someone maintaining a benchmark starts with x I feel this.

That said IMO a fair way of handling order is to prioritize benches with most recent changes. As for benches that haven't changed for a while I guess the maintainers would care less about continuous run result.

joanhey commented 1 month ago

I think that is enough to start one run with a and the next with z. And perhaps we still see some differences in the results.

This run and the next, change the servers, databases, ... so the change it's for all frameworks. Don't depend from the changes in the frameworks. A mature framework need less changes than a young one. Still we can bench in local to test small changes.

itrofimow commented 1 month ago

What I maintain starts with u, so I'm heavily biased here, but I would also appreciate this change being implemented.

My concern is not about failures or restarts, as they usually don't happen that often when the environment is stable, but rather about a feedback latency: I mostly use TFB as a measurement tool (and a big shout-out to TE crew for providing that tool), and given a hypothetical performance drop in the ongoing run, I'm left with approx. a day to squeeze a potential fix into the next measurement, and a failure to do so would lead to a feedback latency of two full weeks (every run is approx. a week). Moreover, any dependency bump I do is at least a week (an almost full run) in terms of feedback latency, and 1.5 weeks on average.

Flipping the order between runs (or FWIW randomizing it) would significantly reduce these latencies for me.

joanhey commented 1 month ago

The frameworks that stay in the middle have ~3 days to make changes. It's the same if the bench begin with a or in reverse order. The problem is the frameworks that are the last in the run. Please don't randomize, now we almost know when are the result for our framework. But we need to flip the order in any new run !!

p8 commented 1 month ago

Flipping the order each time makes sense to me.

NateBrady23 commented 1 month ago

I like the idea of flipping the order. I'm just getting back from vacation and catching up on a bunch of stuff. Let's get the environment stable, and then I think this is easy to do. Will leave this open until we get it in.

joanhey commented 1 week ago

After finish the last full run, the next run did not flip the order.

volyrique commented 1 week ago

That's because the tfb-startup.sh script runs tfb-shutdown.sh on startup; the latter is responsible for flipping the order. Is changing the order only after an unsuccessful run by design?

p8 commented 1 week ago

I think the following run was reversed: https://tfb-status.techempower.com/results/3c2e9871-9c2a-4ff3-bc31-620f65da4e74. The “last framework” tested is incorrect though.

NateBrady23 commented 1 week ago

That's because the tfb-startup.sh script runs tfb-shutdown.sh on startup; the latter is responsible for flipping the order. Is changing the order only after an unsuccessful run by design?

No, I forgot that we actually run the shutdown script twice after a successful run because it's being called from the startup script as well. The design was supposed to be the exact opposite. I'll have to move it to the startup script and it will just reverse every time a run starts.