bencheeorg / benchee

Easy and extensible benchmarking in Elixir providing you with lots of statistics!
MIT License
1.42k stars 66 forks source link

Feature suggestion: assert_all_same option to compare alternative implementations #435

Open sabiwara opened 3 months ago

sabiwara commented 3 months ago

Hi @PragTob, how are you doing?

What: an option like assert_all_same: true which would run all functions (for each input if using inputs) and ensure that results are equal for each input across functions.

Context: I've been mostly using Benchee to optimize pure functions in core, which means benchmarking alternative implementations. Failing early if one of my alternative implementation is not equivalent would improve the feedback loop, since I'm usually not writing tests yet during the benchmarking phase (but I often end up adding some dbg or manual checks).

I could write my own small wrapper, but since it might be a common use case I went for suggesting it in Benchee itself.

I'd be happy to send a PR if there is interest.

PragTob commented 5 days ago

:wave: Heyo @sabiwara good but apparently busy, I'm so sorry I missed this until now :sob: Looking at my calendar this was 2 days before I flew on an 11 day vacation so... that's my excuse for missing it :sweat_smile:

Love the idea and in fact I've thought about this many times myself/wanted it for similar workflows :grin: Always is a bit annoying to write an extra layer of tests that iterate over all them.

So would love to have it! :rocket:

My question would be: How often would we run this? For pure functions we'd only need to run it once (during warmup) technically speaking.

But that doesn't cover "all" use cases. It could be: assert_all_same: true that runs it after every run (equivalent to after_each) while assert_all_same: :once (or :scenario) would only run it once.

Oof, I just realize due to the way that benchee runs benchmarks: first all from one benchmarking job, then all from the other benchmarking job this would be quite tough to implement (that all results are the same). Although... if we expect them all to be the same each job would only ever need to carry one return value and on each invocation would need to check that it's the same as the one that's already stored. So... that would probably still work :thinking:

We could also run it in the same way as the pre_check option - it runs all functions once in the beginning. We could implement it as a new option for pre_check - like pre_check: :assert_all_same - these would run once before the benchmark even starts running, check they are all the same and if not abort.

What do you think? Which sounds most useful to you? :eyes:

Again, sorry for missing this so long :sweat_smile:

PragTob commented 5 days ago

IMG_20180407_163943-ANIMATION

sabiwara commented 5 days ago

Hi @PragTob 👋

I'm so sorry I missed this until now

No worries!

So would love to have it! 🚀

Awesome! 💜

We could implement it as a new option for pre_check - like pre_check: :assert_all_same - these would run once before the benchmark even starts running, check they are all the same and if not abort.

This sounds exactly like what I was going for, it sounds perfect.

How often would we run this? For pure functions we'd only need to run it once (during warmup) technically speaking.

Exactly, this would only make sense for pure functions (if it isn't consistent with itself, it can't be expected to be so with other implementations). Running just once as a pre-check seems like the right timing.

If we agree on pre_check: :assert_all_same, I'll try to send a PR next week or so (will also be AFK for a couple of days 😉).

Thanks a lot for considering this idea!

PragTob commented 4 days ago

pre_check: :assert_all_same it is - thanks! Have a wonderful AFK time! :rocket: