bencheeorg / benchee

Easy and extensible benchmarking in Elixir providing you with lots of statistics!
MIT License
1.42k stars 66 forks source link

Project benchmarks #312

Closed jrogov closed 2 years ago

jrogov commented 5 years ago

Sometimes it's necessary for a project to measure it's performance, and however easy to use, Benchee.run is usually a one-shot solution.

I propose ExUnit-like task 'mix benchmark' and DSL to create benchmarks for a project.

These benchmarks are plain .exs Elixir scripts placed in the ./benchmarks directory of a project, and whenever 'mix benchmark' is run, it compiles all the modules within these scripts and runs benchmarks specified within them. It's also possible to pass individual files and directories as arguments to the task to run contained benchmarks, e.g. 'mix benchmark dir1 file1 file2 dir2'.

Changes:

Syntax for DSL is better described with an example. One is contained in the new './example' folder and can be found here: https://github.com/bencheeorg/benchee/blob/94979ee88a784664c2ee3c716e32c0f4fadb06be/example/project_benchmark.exs

Notes:

  1. README is not updated yet. Will do as soon as all the changes are approved.
  2. No tests for this DSL were created. (see 1)
  3. Task doesn't support any of Benchee options. Comments for if it's really required are welcome.

P.S. Kind of a wish: I would strongly recommend to propose this library as a part of standard Elixir distribution, along with ExUnit, since this functionality might be useful for projects of any kind.

devonestes commented 5 years ago

Hey @arikai, thanks for putting this together!

We've had discussions about this kind of thing before, like here: https://github.com/bencheeorg/benchee/issues/236

One of the things that I think Tobi and I both really love is that Benchee's API is "just a function." This both makes it really easy for folks to use, and makes interop with Erlang much easier (which is something we both care about as well).

We do the same kind of thing you're talking about in this library already, but we do it with a bash script instead of a mix task.

If what you're looking for is a way to run all your benchmarks with one command, I'd recommend going with a bash script (or a Makefile!). Let's see what @PragTob says about this, though, before making any decision. I'm ok with having this if he is, but my initial feeling is that it's a lot of rather tricky code to maintain and it's only purpose is to replace the need for a bash script.

Of course, if you want to build something on top of benchee that offers this sort of DSL, you're welcome to do so!

And I love that you would want this in Elixir core, but they've made it really clear that the only thing going in Elixir core is what's needed to develop the language itself, and this doesn't fall into that category, so I kind of agree with them on keeping this as a separate package.

OvermindDL1 commented 5 years ago

For note, I already have a mix bench ... functionality in my sandbox project, it's something I've been meaning to pull out sometime, if it would actually be useful?

jrogov commented 5 years ago

@devonestes

From what I've read on the issue linked above, you do approve this idea in general, but have concerns on the maintenance and usecases of this addition.

So here are my two cents.

1. Why not shell scripts?

Of course you can do this with shell scripts. Hell, what can't you do with them?

However, there are cons.

Bash is basically restricted to commands provided by mix, its plugin tasks, and some one-liners for iex. Something more complex - write .exs. But why do we need to use 2 languages to just use convenient devtools? Why do we need to reimplement it over and over again (or copy) for this functionality?

I believe, this is exactly the reason dev tools exist and mix in particular, and just prettty much libraries and frameworks in general: to encapsulate a pattern used over and over again. And to use bash in projects like this seems veeery hacky and not convenient at all. Bash is more like "use available tools for your particular case", not "add functionality to tools". And project-wide benchmarks is a very common pattern.

2. Maintaining second API seems like another burden.

But please look closely at what this DSL and task do. DSL simply creates a bunch of functions (setup, teardown, run) and sets attributes for the latter. Task simply compiles benchmarks, searches for this attribute in compiled modules and simply calls "run" functions (with call to optional module-wide setup and teardown)

But what run* functions actually do? They simply call Benchee.run with options defined with the DSL and options passed with the task (although there is none for now). And options defined with the DSL are exactly the same you would usually pass to Benchee.run!

So when we talk about maintaining this "second API" we talk about just adding some switches for OptionParser in task module.

That's it.

3. What are the pros of the DSL to simple API of one function?

Let's talk about tests. It's surely possible to make them as simple .exs scripts. It's surely possible to run them via bash script. But why? Is it easier to use? No. Is it cleaner to use? No. Is it easier to write them? Heck no, DSL looks way cleaner and easier to read. Tests are easier to change and maintain. DSL is usually closer to the natural language (although, more domain specific) than tests written in two programming languages, one of them being some quirky shell script.

I do not deny that single-function-API is way too easier to use for one-shot benchmarks. But when we're talking about libraries changes in which may degrade performance, and, what's by orders of magnitude worse, may cascade into projects that use them, causing to degrade them as well.

Still waiting for the @PragTob 's thoughts about this.

jrogov commented 5 years ago

@OvermindDL1 Sure, especially if you have some interesting ideas and features that I missed to implement.

jrogov commented 5 years ago

I still can't get around this strange error from build with 1.6.6 + OTP21 (second build) that I can't even reproduce on my machine, so help needed.

OvermindDL1 commented 5 years ago

Mine is very specific to my sandbox project, but it's an easy start for something bigger.

First my addons to mix.exs (this should become it's own mix task instead):

  defp aliases do
    [
      bench: &bench/1
    ]
  end

  defp bench(arglist) do
    Mix.Tasks.Run.run(["bench/bench.exs" | arglist])
  end

Then bench/bench.exs (would be part of that mix task):

System.argv()
|> Enum.each(fn bench ->
  try do
    Code.require_file("bench/#{bench}_bench.exs")
    mod = String.to_atom("Elixir.#{Macro.camelize(bench)}Bench")

    if :erlang.function_exported(mod, :module_info, 0) do
      if(:erlang.function_exported(mod, :classifiers, 0), do: mod.classifiers(), else: [nil])
      |> case do
        nil -> [nil]
        [] -> [nil]
        clas when is_list(clas) -> clas
      end
      |> Enum.each(fn cla ->
        if cla do
          title = "Benchmarking Classifier: #{cla}"
          sep = String.duplicate("=", String.length(title))
          IO.puts("\n#{title}\n#{sep}\n")
        end

        setup = if(:erlang.function_exported(mod, :setup, 1), do: mod.setup(cla), else: nil)

        m =
          cond do
            :erlang.function_exported(mod, :time, 2) -> mod.time(cla, setup)
            :erlang.function_exported(mod, :time, 1) -> mod.time(cla)
            true -> 5
          end

        inputs =
          cond do
            :erlang.function_exported(mod, :inputs, 2) -> mod.inputs(cla, setup)
            :erlang.function_exported(mod, :inputs, 1) -> mod.inputs(cla)
            true -> nil
          end

        actions =
          cond do
            :erlang.function_exported(mod, :actions, 2) -> mod.actions(cla, setup)
            true -> mod.actions(cla)
          end

        Benchee.run(
          actions,
          [
            time: m,
            warmup: m,
            memory_time: m,
            print: %{fast_warning: false}
          ] ++ if(inputs, do: [inputs: inputs], else: [])
        )

        if(:erlang.function_exported(mod, :teardown, 2), do: mod.teardown(cla, setup))
      end)
    end
  rescue
    r -> IO.puts("Unknown exception: #{Exception.format(:error, r, __STACKTRACE__)}")
  catch
    {type, reason} when type in [:throw, :exit] -> IO.puts("Unknown error: #{Exception.format(type, reason, __STACKTRACE__)}")
    e -> IO.puts("Unknown error: #{Exception.format(:throw, e, __STACKTRACE__)}")
  end
end)

And basically it can run bench benchmark modules (the naming pattern should probably also be changed, as stated this is my naming convention and is not at all generic). It can also run with a variety of input types as well with each being it's own benchee run, among other features.

But for example here is my most recent benchmark module, it is not using all of the features even remotely, and I could leave out some of the no-op functions, but I like to keep them in for documentation:

defmodule MapLookupOrThrowBench do
  def classifiers(), do: nil

  def time(_), do: 2

  def inputs(_cla),
    do: %{
          "small-2-existing" => {2, 1..2 |> Map.new(&{&1, to_string(&1)})},
          "medium-10-existing" => {10, 1..10 |> Map.new(&{&1, to_string(&1)})},
          "large-10000-existing" => {10000, 1..10000 |> Map.new(&{&1, to_string(&1)})},
          "small-1-not-existing" => {2, 1..1 |> Map.new(&{&1, to_string(&1)})},
          "medium-9-not-existing" => {10, 1..9 |> Map.new(&{&1, to_string(&1)})},
          "large-9999-not-existing" => {10000, 1..9999 |> Map.new(&{&1, to_string(&1)})},
    }

  def setup(_cla), do: nil

  def teardown(_cla, _setup), do: nil

  def actions(_cla, _setup),
    do: %{
          "Map.fetch" => fn {k, m} ->
            Map.fetch(m, k)
          end,
          "catch-Map.fetch!" => fn {k, m} ->
            try do Map.fetch!(m, k)
            rescue _ -> :error
            catch _ -> :error
            end
          end,
          "Access" => fn {k, m} ->
            m[k]
          end,
          "catch-direct-always-fail" => fn {_k, m} ->
            try do m.a
            rescue _ -> :error
            catch _ -> :error
            end
          end,
          "case" => fn {k, m} ->
            case m do
              %{^k => v} -> v
              _ -> :error
            end
          end,
    }
end

Honestly, for a benchmark scaffolding thing I'd look more into how Rust's Criterion API does it, it's very clean and nice, especially with a few macro's tossed in.

PragTob commented 5 years ago

:wave:

Hi everyone and thanks for your contributions first and foremost :tada: :pray:

I mostly agree with @devonestes - I love it that benchee uses simple functions to do its job. I don't want to use macros where functions can do the job. That there's no default way to run all benchmarks I can see, however for that I'd rather suggest or include a mix task or an easy bash script. I also don't see that the requirement to run all of them nicely is enough to introduce a DSL.

However, I see and know that people want a DSL (although I still don't really understand why - I don't believe that taking benchmarks more seriously is bound to there being a DSL, if you care you run them and if not you don't), and we build tools and I believe that they should also cater to the people who use them. And if a DSL makes people happier/somewhat likelier to benchmark I'm happy to provide one (at a certain cost).

My opinion/solution for this is to not include it in "core" but as a separate package (initial name idea: benchee_dsl) - benchee is designed in a way so that it's easy to add stuff around it, why not a DSL? I always wanted to build this myself for funzies after reading the meta programming book but I haven't yet gotten around to it :grin: I'm also happy to have this extension be in the bencheeorg/official.

In that project I'd have a few remarks though (likely in the README):

The last one is important to me, it's sometimes already not so great to upgrade all plugins. It's also another reason why I don't want this in "core" because in "core" everything should always be up to date and fully working.

The DSL itself is remarkably similar to what I had in mind looking at it :ok_hand: (mostly the benchmark <--> scenario relationship and how hooks are incorporated) I have some minor points but we can discuss those another time.

I haven't looked at the implementation yet, but it'd be important to me that it's easy to add new options etc (highest goal: if we add a new option to benchee I don't even need to touch it, it just works).

How does that sound for everyone?

(gotta run now, sorry, here have a bunny)

image

jrogov commented 5 years ago

@PragTob Then I am really looking forward for you to check the implementation.

And, as I already said, it's not some all-new custom API that changes the way Benchee works, the Benchee as it exists (single-function-API) is barely touched (on option is divided in two, you can barely call this a change).

It is simply a wrapper around Benchee.run/2 with custom interface for passing options and benchmarks itself.

As for the

(highest goal: if we add a new option to benchee I don't even need to touch it, it just works)

It's easy to implement really. Even though I'd advocate for just a subset of them (for starters, I can imagine NO case where one might want to pass before_scenario callback as a COMMAND-LINE OPTION)

PragTob commented 5 years ago

@arikai I didn't mean command line options (I haven't checked the PR yet) but more about the DSL in the example file. Like how you specify time 5 if it was able to do some_new_option 44 without me telling it about it I'd accept a bit more implementation complexity.

I'll see when I'll have time to review it that week, due to my opinion that macros should only be used when functions absolutely don't work my macro isn't as fluent as it should be :sweat_smile:

jrogov commented 5 years ago

@PragTob If I understand you correctly, it's important to you to be able to pass the very SAME options with DSL as you would with Benchee.run. As I already mentioned in my previous message

It is simply a wrapper around Benchee.run/2 with custom interface for passing options and benchmarks itself.

And basically that's it. Surely it wraps some blocks into functions and what else, but the options are exactly the same you are to support for Benchee.run options and per-scenario options.

About your assumption

due to my opinion that macros should only be used when functions absolutely don't work my macro isn't as fluent as it should be

It's true for languages with rather simple macro preprocessing system (think C family, Erlang), but Elixir has more of a Lisp-like macro. It means that macro return somewhat of Elixir Intermidiate Representation that it works with during the actual compilation of the files. And what really distinguishes Lisp-like macro system is that you can use any function you would use in the actual code, with difference that (if not quoted) it will be executed during compilation. It means that we can do whatever complex stuff we want, and the only thing we will be (hardly) slowing down is compilation itself: the parsing of our DSL or what else does not happen in the runtime. In fact, all we have after complete compilation of the modules in this Elixir DSL (or pretty much most of them) is a simple module with some functions and attributes.

Please check the implementation as soon as you can.

PragTob commented 5 years ago

@arikai my main point wasn't that it should be the same options (that's a given) but that when I add new one to benchee I don't need to add it manually. So kinda that it builds the available DSL methods automatically from the available keys in Benchee.Configuration at compilation time or something like this (maybe this is how it's implemented, as I said, haven't checked yet).

So that when I introduce a new key in benchee like reduction_time automagically benchee_dsl also had a new dsl function reduction_time

jrogov commented 5 years ago

Please refer to my previous answer

2. Maintaining second API seems like another burden.

But please look closely at what this DSL and task do. DSL simply creates a bunch of functions (setup, teardown, run) and sets attributes for the latter. Task simply compiles benchmarks, searches for this attribute in compiled modules and simply calls "run" functions (with call to optional module-wide setup and teardown)

But what run* functions actually do? They simply call Benchee.run with options defined with the DSL and options passed with the task (although there is none for now). And options defined with the DSL are exactly the same you would usually pass to Benchee.run!

So when we talk about maintaining this "second API" we talk about just adding some switches for OptionParser in task module.

That's it.

The only thing is that it's just a parameter to a macro, not a function: exactly the same way Benchee.run/2 accepts options as an optional second argument.

PragTob commented 5 years ago

@arikai where do we want to go from here, shall I set up a new benchee_dsl repo for us to put this in and then PR against this one? PR the printing of system against this one?

Thanks!

jrogov commented 4 years ago

@PragTob Sorry for sudden disappearance! I addressed most of your remarks and the rest are waiting for your opinion!

About separate repo: I do believe that these two rather small modules belong to Benchee, since there are no drawback for such decision. I would've considered separate repo if the project were to be included into e.g. final mix release and therefore a size of a final archive would be important, but this isn't the case.

Please do inform me if there are any drawbacks.

PragTob commented 4 years ago

@arikai right back at you :) Sorry, I was busy with different stuff as well. Thanks for all the work :green_heart:

There are some errors here, but I believe they can be fixed/figured out. Some of them seem to be formatter incompatibilities between versions which we could fix, along with some dialyzer warnings that also seem questionable.

There might be no observable code/artifact drawbacks for such a decision, but there are others:

Anyhow, so long :wave:

IMG_20180603_163802

devonestes commented 4 years ago

Just to chime in here - I agree with all of the points @PragTob mentions above, but the primary reason I don’t want to have macros in Benchee is that they can’t be used in Erlang/Gleam/LFE/etc. We’ve tried hard to make sure this library is useful for the entire BEAM community, and unfortunately macros - since they’re specific to Elixir - don’t play nicely with those other languages.

I’m all for folks building language-specific libraries on top of the core we provide in Benchee if there are things (such as this DSL) that are possible in one language but not in another, but I would really like to see Benchee itself remain useful to any language on the BEAM and not to have sections of this library that are specific to one language or another.

If one day Elixir macros become usable in those other languages, I’d be all for revisiting this, actually, but until that happens I would strongly prefer to see this functionality in a separate package.

PragTob commented 4 years ago

@devonestes ah I actually forgot about that, great point thank you :green_heart:

I'm fine having the extra package in the benchee org and maintaining it. I just won't promise that it'll always support all the newest options/things in benchee "core" :)

PragTob commented 2 years ago

I recon our friends from @hrzndhrn have a dsl package out there for folks. I'll review it and then maybe start a convo with them if they'd want to move it to the main benchee org.

https://github.com/hrzndhrn/benchee_dsl