Open mrocklin opened 8 years ago
I already have a bunch of benchmarks that I use when profiling performance. The goal of this issue should be first to set up sufficient infrastructure so that it is easy for others to add benchmarks in the future.
In case there's desire to have a real-world test case I'm posting below a very simple Monte-Carlo GBM simulation. I've got similar code running on a distributed cluster so have a vested-interest in making sure it runs efficiently! I'm hopeful that a real-world use-case such as this can inform future optimisation efforts (though I wonder if it's too simple to be of any use?)
>>> F = 50.
>>> sigma = 0.2
>>> covariance = np.array([
... [1, 0.7],
... [0.7, 1],
... ])
>>> Te = np.linspace(0, 3, 1 + 365*3)
>>> rvs = multivariate_normal(covariance, Te.size, nsims=1e5, nchunks=60)
>>> sims = simulate_gbm(F, sigma, Te, rvs)
>>> sims
dask.array<mul, shape=(2, 1096, 100000), dtype=float64, chunksize=(2, 19, 100000)>
>>> %%time
... fut = cluster.compute(sims, optimize_graph=True, pure=False)
... wait(fut)
Wall time: 11.8 s
>>> %%time
... sims_ = sims.compute(scheduler='synchronous')
Wall time: 46.8 s
In conversation here it was suggested by @minrk and @ogrisel that we set up something like https://github.com/spacetelescope/asv to monitor performance of a few important computation patterns over time. This would help us to identify performance regressions as we change the scheduler.