Is it possible to benchmark a subprocess call like `python -c "8*9"`?

NightMachinery commented 2 years ago

Is it possible to benchmark a subprocess call like python -c "8*9"?

I need to benchmark some Python code, and I could not find a good in-Python solution. I am wondering if I can use benchee. (I might want to benchmark some Julia code as well, so a language-agnostic benchmarking regime is beneficial for me in any case.)

PragTob commented 2 years ago

:wave:

Ha! The lack of benchmarking tools in Python is among the bane of my existence... so much that one of my long term plans/ideas is taking reusable parts of benchee and putting them into a reusable binary and then just reimplementing the runner for different languages.

Ok, sorry.

Is it possible? Yes! You can totes just call System.cmd

Should you do it? Probably not.

That adds whatever overhead System.cmd has to your measurements
it also includes the startup time of the python interpreter (which is non negligible)
it also includes warmup and benchees warmup executions won't do you any good here
if you want to benchmark CLI executions, I almost hate to admit, but there is a better tool: https://github.com/sharkdp/hyperfine

for 2. + 3. I go into more detail here: https://pragtob.wordpress.com/2017/08/29/careful-what-you-measure-2-1-times-slower-to-4-2-times-faster-mjit-versus-truffle-ruby/ ("What time are we measuring?" section

Hope this helps!

Ah, one more: A very simple benchmarking implementation isn't too hard, see for instance this: https://github.com/PragTob/rubykon/tree/main/lib/benchmark that's less than 200loc for what I think of as a passable and minimally viable benchmarking library :)

NightMachinery commented 2 years ago

My use cases take relatively long times on the order of a minute, so the overhead is negligible for me.

benchees warmup executions won't do you any good here

Why not?

hyperfine

It doesn’t support peak memory consumption.

PS: Will the memory of these forked processes be benchmarked correctly with benchee?

PragTob commented 2 years ago

Ah.

Warmup won't do you any good, as you will start the process again with each iteration - this resets the "warmup" of any runtime, it'll JIT again
weird that hyperfine doesn't do memory measurement, you can get peak memory usage by using time -v command on a unix system (might be /usr/bin/time -v on mac). Benchee also won't give you memory measurements here, as benchee measures the memory of the running elixir/erlang process and not of the operating system level process spawned by System.cmd

NightMachinery commented 2 years ago

I am not using a JITed language here, so that doesn't matter. But the inability to measure the memory of forked processes makes benchee unsuitable. Can't you add an option to measure those as well?

I am current using time, but it is pretty bare-bones. I had to write the logic of running the measurements multiple times and aggregating the results myself.

PragTob commented 2 years ago

even in a non JITed language warmup matters for many scenarios. Erlang was non JIT'ed for the majority of benchee's life time. I still implemented it before even the first release (iirc).

Measuring the memory of an externally spawned process is way different from what benchee does. I don't think I'll ever add this, as benchee just isn't the tool for the job here. I wouldn't even know how to do it properly. Before we go there, there are many things to improve around the memory measurement benchee has (it only measures the one process you spawned, not other processes that process may spawn or other processes in the system for instance)

bencheeorg / benchee

Is it possible to benchmark a subprocess call like `python -c "8*9"`? #338