Open cheerfulstoic opened 1 week ago
:wave:
Hello there - thanks for opening an issue! :green_heart:
I think I gotta think on that a bit more. My first reaction was "yeah" then I was like "I'm not sure" then reading through the use cases in the docs I found this one:
Recording the PID of self() for use in your benchmark (each scenario is executed in its own process, so scenario PIDs aren't available in functions running before the suite)
And at least that one would need to execute for each one in parallel
.
However, quite frankly, parallel
is something I sometimes forget about in design as it is rarely used :sweat_smile: I'd agree that intuitively it'd make sense to just run once per scenario. Reading through the docs, interplay with parallel
isn't even mentioned once soooo chances are this isn't intended/I might have forgotten.
Smells like we might need another hook type but boi would I not be looking forward to that (I wrote the hooks many years ago during a longer vacation and the complexity they bring is big vs. how often I think they're used)
Can you tell me what your use case is (broadly speaking)? I'd like to think through it - plus it's a feature I think is rarely used so reinforcement for its usefulness is appreciated :grin:
🎉 Thanks!
So, I'm testing out different modules that all implement the same behavior. Each module starts up a process or a supervision tree of processes to do basically the same job, but I want to compare the performance of individual calls but also what happens as the processes queues get more and more loaded.
Here's the sanitized version of my benchee script:
definitions = [ ... ]
inputs =
Map.new(definitions, fn definition ->
{
"#{definition[:module_under_test]}: #{inspect(definition[:module_under_test_opts])}",
definition
}
end)
Benchee.run(
%{
"foo" => fn {%{
module_under_test: module_under_test,
module_under_test_opts: module_under_test_opts
}, %{pid: _pid, users: users}} ->
user = Enum.random(users)
module_under_test.foo(user.id)
end,
"update" => fn {%{
module_under_test: module_under_test,
module_under_test_opts: module_under_test_opts
}, %{pid: _pid, users: users}} ->
user = Enum.random(users)
module_under_test.bar(user.id, %{attr1: "Biz Buzz #{:random.uniform(5000)}"})
end
},
warmup: 2,
time: 5,
inputs: inputs,
parallel: 2,
before_scenario: fn %{
module_under_test: module_under_test,
module_under_test_opts: module_under_test_opts
} = input ->
{:ok, pid} = module_under_test.start_link(module_under_test_opts)
Process.unlink(pid)
users =
Enum.map(0..20, fn i ->
{:ok, user} =
module_under_test.create(%{name: "User #{i}", email: "user#{i}@example.com"})
user
end)
{input, %{users: users, pid: pid}}
end,
after_scenario: fn {_input, %{pid: pid}} ->
Process.exit(pid, :kill)
Process.sleep(500)
end,
formatters: [
{Benchee.Formatters.HTML, file: "benchee_output.html"},
{Benchee.Formatters.CSV, file: "benchee_output.csv"},
Benchee.Formatters.Console
]
)
The Process.unlink
was because I was too lazy to make a start
function instead of start_link
so that the script process didn't die when killing the processes for the modules under test.
I could use something like Task.async_stream(1..1_000, ...
to do the parallel bits 🤷 I guess I would need to experiment with the number of iterations so that I get a good comparison of IPS for each scenario, but not so small that the finishing up of parallel jobs doesn't affect the results (e.g. if I'm running async_stream
with max_concurrency
of 4
, then at the end of a benchee iteration there would be a small period of time where the last three things are being performed and so it's not running at full concurrency.... 🤷 )
All of that is also making me wonder, maybe this isn't a job for benchee
? benchee
is the tool that I reach for whenever benchmarking, but maybe I should just be doing different runs and seeing how long they take in total at different levels of parallelism and maybe have some telemetry to record things like the queue length to see if things are getting backed up 🤔
Thanks for sharing!
I mean the reason of existence for parellel
is that we can stress test certain code - so I think this should be fine to do via benchee.
A workaround that might work for you right now would be to have each module_under_test
be its own Benchee.run
invocation and then you could save the results and load them for comparison. The before/after would then just be around Benchee.run
invocations. Kinda like:
Enum.each(modules, fn module_under_test ->
former_before_scenario()
Benche.run(%{
# using module_under_test
save: "folder/cool_bench#{module_under_test}.benchee"
}
end)
Benchee.report(load: "folder/cool_benchee_*")
(written from memory, untested)
I asked @pablocostass quickly and with some time (this is a longer thing) I'll review wording. But most likely I'd change it so that before_scenario
does what you thought it'd do and introduce a new thing... that I have yet to name before every parallel execution of the benchmark (what before_scenario
does right now). But that's more involved and needs some time, not sure when i'd get to it.
That said, the initial changes to "fix" before_scenario
should be rather easy and should (probably) all be contained in the following code/moving around before_scenario
/after_scenario
in here:
https://github.com/bencheeorg/benchee/blob/main/lib/benchee/benchmark/runner.ex#L81-L119
Thanks very much! I've been using your workaround and I've got a version that's been working fine. Now I have the harder task of interpreting my results 😅
Thanks again!
Good luck on interpreting those results, always fun. When in doubt, MOARE BENCHMARKS!
Is it by design that
before_scenario
runs multiple times ifparallel
is greater than1
? I'm trying to setup a single process which can be accessed by the parallel run scenarios so that I can test how processes scale under load. If this is by design I'll find a way to work around it, but it seems like maybebefore_scenario
should just run once for each scenario no matter the value ofparallel
🤔