Documentation - Githubissues

TomAugspurger commented 8 years ago

This is a sketch for some sections of documentation that should go in the README.

What to test?

Ideally, benchmarks measure how long our project (dask, distributed) spends doing something, not the underlying libraries they're built on. We want to limit the variance across runs to just code we control.

For example, I suspect (self.data.a > 0).compute() is not a great benchmark. My guess (without having profiled) is that the .compute part takes the majority of the time, most of which would be in pandas / NumPy. (I need to profile all these. I'm reading through dask now to find places where dask is doing a lot of work.)

Benchmarking new Code

If you're writing an optimization, say, you can benchmark it by

writing a benchmark that exercises your optimization and placing it in benchmarks/
setting the repo field in asv.conf.json to the path of your dask / distributed repository on your local file system
running asv continuous -f 1.1 upstream/master HEAD (optionally with a regex -b <regex> to filter to just your benchmark.

Naming Conventions

Directory Structure

This repository contains benchmarks for several dask related projects. Each project needs it's own benchmark directory because asv is built around one configuration file (asv.conf.json) and benchmark suite per repository.

pitrou commented 8 years ago

When benchmarking local changes, I also find asv dev to be very useful. Not sure it needs to be mentioned in the README, though.

pitrou commented 8 years ago

I think we should also have guidelines for benchmarks:

have individual time_xxx methods take on the order of 100-300 ms if possible (obviously some workloads will need more), so that asv can repeat the method several times and output a stable minimum
perhaps choose worker counts so as to minimize variability?

pitrou commented 8 years ago

Another issue: which timer function should be used? asv's default timer may not be adequate: https://asv.readthedocs.io/en/latest/writing_benchmarks.html#timing

Should we measure CPU time or wallclock time? IMHO we should measure wallclock time: if dask or distributed schedules tasks inefficiently and doesn't make full use of the CPU, it's a problem that should appear in the benchmark results.

danielballan commented 8 years ago

@TomAugspurger I'm interested in helping with this, partly as a way to become more familiar with the dask API. Is there anything in particular you would prefer me to target, to start?

TomAugspurger commented 8 years ago

@danielballan great, thanks! I'm guessing that @mrocklin, @jcrist, and Antoine have the most knowledge on which parts of dask would be best to benchmark.

My current thinking is that we'll have two kinds of benchmarks: The first are higher-level benchmarks that hit things like top-level methods on dask.array, dask.bag, and dask.dataframe. The second kind of benchmarks are for "internal" methods in places like https://github.com/dask/dask/blob/master/dask/optimize.py.

I think the first kind will be easier to write benchmarks for as you learn the library (that's true for me anyway. ATM I have no idea how to write a good benchmark for something in dask.optimize).

mrocklin commented 8 years ago

I agree with @TomAugspurger 's classification of high-level external benchmarks and internal ones.

I also agree that high-level external benchmarks are probably both the more useful and the more approachable. Actually, I'm curious if, as with all things, we can steal from Pandas a bit here. Are there benchmarks in Pandas that are appropriate to take?

There are some extreme things we can test as well, such as doing groupby-applies with small dask dataframes with 1000 partitions, or calling

delayed(sum)([delayed(inc)(i) for i in range(1000)].compute(get=...)

These should be good to stress the administrative side.

pitrou commented 7 years ago

Other question: I see a couple of existing benchmarks parameterize on the get function (multiprocessing.get, threaded.get, etc.). Is this useful/desired? What are we trying to achieve here?

TomAugspurger commented 7 years ago

@pitrou for a bit, I was thinking these benchmarks could be helpful for users to see the overall performance characteristics of the various backends across different workloads. In hindsight it's probably best to keep this strictly for devs.

I'll send along a PR to remove those when I get a chance. Been swamped lately.

dask / dask-benchmarks

Documentation #1

What to test?

Benchmarking new Code

Naming Conventions

Directory Structure