bencheeorg / benchee

Easy and extensible benchmarking in Elixir providing you with lots of statistics!
MIT License
1.42k stars 66 forks source link

`before_scenario` and `parallel` options #437

Open cheerfulstoic opened 1 week ago

cheerfulstoic commented 1 week ago

Is it by design that before_scenario runs multiple times if parallel is greater than 1? I'm trying to setup a single process which can be accessed by the parallel run scenarios so that I can test how processes scale under load. If this is by design I'll find a way to work around it, but it seems like maybe before_scenario should just run once for each scenario no matter the value of parallel 🤔

PragTob commented 1 week ago

:wave:

Hello there - thanks for opening an issue! :green_heart:

I think I gotta think on that a bit more. My first reaction was "yeah" then I was like "I'm not sure" then reading through the use cases in the docs I found this one:

Recording the PID of self() for use in your benchmark (each scenario is executed in its own process, so scenario PIDs aren't available in functions running before the suite)

And at least that one would need to execute for each one in parallel.

However, quite frankly, parallel is something I sometimes forget about in design as it is rarely used :sweat_smile: I'd agree that intuitively it'd make sense to just run once per scenario. Reading through the docs, interplay with parallel isn't even mentioned once soooo chances are this isn't intended/I might have forgotten.

Smells like we might need another hook type but boi would I not be looking forward to that (I wrote the hooks many years ago during a longer vacation and the complexity they bring is big vs. how often I think they're used)

Can you tell me what your use case is (broadly speaking)? I'd like to think through it - plus it's a feature I think is rarely used so reinforcement for its usefulness is appreciated :grin:

cheerfulstoic commented 1 week ago

🎉 Thanks!

So, I'm testing out different modules that all implement the same behavior. Each module starts up a process or a supervision tree of processes to do basically the same job, but I want to compare the performance of individual calls but also what happens as the processes queues get more and more loaded.

Here's the sanitized version of my benchee script:

definitions = [ ... ]
inputs =
  Map.new(definitions, fn definition ->
    {
      "#{definition[:module_under_test]}: #{inspect(definition[:module_under_test_opts])}",
      definition
    }
  end)

Benchee.run(
  %{
    "foo" => fn {%{
                   module_under_test: module_under_test,
                   module_under_test_opts: module_under_test_opts
                 }, %{pid: _pid, users: users}} ->
      user = Enum.random(users)

      module_under_test.foo(user.id)
    end,
    "update" => fn {%{
                      module_under_test: module_under_test,
                      module_under_test_opts: module_under_test_opts
                    }, %{pid: _pid, users: users}} ->
      user = Enum.random(users)

      module_under_test.bar(user.id, %{attr1: "Biz Buzz #{:random.uniform(5000)}"})
    end
  },
  warmup: 2,
  time: 5,
  inputs: inputs,
  parallel: 2,
  before_scenario: fn %{
                        module_under_test: module_under_test,
                        module_under_test_opts: module_under_test_opts
                      } = input ->
    {:ok, pid} = module_under_test.start_link(module_under_test_opts)

    Process.unlink(pid)

    users =
      Enum.map(0..20, fn i ->
        {:ok, user} =
          module_under_test.create(%{name: "User #{i}", email: "user#{i}@example.com"})

        user
      end)

    {input, %{users: users, pid: pid}}
  end,
  after_scenario: fn {_input, %{pid: pid}} ->
    Process.exit(pid, :kill)
    Process.sleep(500)
  end,
  formatters: [
    {Benchee.Formatters.HTML, file: "benchee_output.html"},
    {Benchee.Formatters.CSV, file: "benchee_output.csv"},
    Benchee.Formatters.Console
  ]
)

The Process.unlink was because I was too lazy to make a start function instead of start_link so that the script process didn't die when killing the processes for the modules under test.

I could use something like Task.async_stream(1..1_000, ... to do the parallel bits 🤷 I guess I would need to experiment with the number of iterations so that I get a good comparison of IPS for each scenario, but not so small that the finishing up of parallel jobs doesn't affect the results (e.g. if I'm running async_stream with max_concurrency of 4, then at the end of a benchee iteration there would be a small period of time where the last three things are being performed and so it's not running at full concurrency.... 🤷 )

All of that is also making me wonder, maybe this isn't a job for benchee? benchee is the tool that I reach for whenever benchmarking, but maybe I should just be doing different runs and seeing how long they take in total at different levels of parallelism and maybe have some telemetry to record things like the queue length to see if things are getting backed up 🤔

PragTob commented 1 week ago

Thanks for sharing!

I mean the reason of existence for parellel is that we can stress test certain code - so I think this should be fine to do via benchee.

A workaround that might work for you right now would be to have each module_under_test be its own Benchee.run invocation and then you could save the results and load them for comparison. The before/after would then just be around Benchee.run invocations. Kinda like:

Enum.each(modules, fn module_under_test ->
  former_before_scenario()
  Benche.run(%{
   # using module_under_test
   save: "folder/cool_bench#{module_under_test}.benchee"
}
end)

Benchee.report(load: "folder/cool_benchee_*")

(written from memory, untested)


I asked @pablocostass quickly and with some time (this is a longer thing) I'll review wording. But most likely I'd change it so that before_scenario does what you thought it'd do and introduce a new thing... that I have yet to name before every parallel execution of the benchmark (what before_scenario does right now). But that's more involved and needs some time, not sure when i'd get to it.

That said, the initial changes to "fix" before_scenario should be rather easy and should (probably) all be contained in the following code/moving around before_scenario/after_scenario in here:

https://github.com/bencheeorg/benchee/blob/main/lib/benchee/benchmark/runner.ex#L81-L119

cheerfulstoic commented 4 days ago

Thanks very much! I've been using your workaround and I've got a version that's been working fine. Now I have the harder task of interpreting my results 😅

Thanks again!

PragTob commented 4 days ago

Good luck on interpreting those results, always fun. When in doubt, MOARE BENCHMARKS!

IMG_20180818_152929