bencheeorg / benchee

Easy and extensible benchmarking in Elixir providing you with lots of statistics!
MIT License
1.41k stars 66 forks source link

Avoid handing all of inputs to every scenario process #412

Closed PragTob closed 9 months ago

PragTob commented 9 months ago

This is a fix related to #408

In #408 unnecessary access to scenario benchmarking function, scenario input and the inputs configuration was removed from Statistics calculation and formatters. Impact on memory consumption is huge.

However, there's another place remaining where we hand over too much data to a process: https://github.com/bencheeorg/benchee/blob/59f9886dc5ec98e7c2d21cb8b1d3f6bed3811a15/lib/benchee/benchmark/runner.ex#L81-L83

Each scenario runs in its own process. We hand over the Scenario (good, needed) but also the ScenarioContext - which includes the entire configuration which means also all inputs (bad). It's bad because we double up on the input for the Scenario itself (already part of Scenario iirc) and also send over all other inputs needlessly. For 4 jobs and 4 inputs = 16 scenarios that means we copy all of inputs 16 times without any need :facepalm:

The "quick fix" would be to just scrub the initial configuration as we do before sending it to formatters now. However, there's lots more data there that we don't need. My proposal would be to create a new struct BenchmarkConfiguration (or something) which is essentially a sub-set of Configuration (doesn't have to be, but probably is right now) - we can then use that instead.

Depending on how many properties are needed, we could also inline these values into ScenarioContext. For reference here is, what ScenarioContext holds right now:

  @type t :: %__MODULE__{
          config: Benchee.Configuration.t(),
          printer: module,
          current_time: pos_integer | nil,
          end_time: pos_integer | nil,
          scenario_input: any,
          num_iterations: pos_integer,
          function_call_overhead: non_neg_integer
        }