Open mrocklin opened 7 years ago
Which stopping criterion to use is independent of distributing it, right? This sounds like a nice simple example.
If you're looking for something more complex, the parallelization here would be interesting: https://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf
I think everything except the parallelization is implemented in scikit-optimize. You might need some ML help, maybe @mechcoder is interested ;)
I have begun dask-patternsearch
project that may be useful here. You can think of it as a line-search algorithm generalized to multiple dimensions without using derivatives. Its distinguishing feature is you can throw as many cores at it as you want. In principle, optimizing a 100-dimensional problem (that fits in memory) on 100,000 cores should be just fine. The algorithm isn't particularly fancy, smart, or optimal, but I would say it's better than grid search and Nelder-Mead simplex.
Read more about it here: http://www.cs.wm.edu/~va/research/sirev.pdf http://www.cs.wm.edu/~va/research/deto.pdf
Feel free to direct questions to the repo. Again, the project is brand-spanking-new and is missing such nice things as documentation and tests.
Example usage:
import distributed
from dask_patternsearch import search
client = distributed.Client()
def f(x):
return x.dot(x)
x0 = np.array([12.3, 4.5])
stepsize = np.array([1, 1])
results = search(client, f, x0, stepsize, stopratio=1e-5)
Hi, I've been working on an online/async version of the code here: https://github.com/thomasgreg/dask-searchcv/tree/online-fit.
It ended up involving a refactoring of much of the code in model_selection.py
to allow updates to the graph without generating extra tasks (the existing code assumes that the full parameter space is know beforehand).
If this approach looks useful I can clean up the branch and put in a PR (it's a bit messy right now as I was hesitant to just replace the existing batch implementation fully without feedback).
Most tests are passing but for a few edge cases I'm working out (dask delayed object in fit_params, the graph size without cache, ...). There's an example script in: examples/async_randomsearchcv.py
.
Best regards, Greg
Thanks for taking a look at this. I suggest you submit a PR early and we can discuss it more there. At first glance, I'm not sure what your intent is here. What problem are you trying to solve? If the goal is to solve this issue, I think there are simpler solutions that don't involve major changes to the existing code. If the goal is to make the existing classes asynchronous, I also think there are simpler ways to accomplish that. Either way, I still suggest you submit a WIP PR and we can discuss things more there.
Thanks for the quick reply - I've subitted a PR. I understand that big code change is a headache, and in hindsight can see how it might be done in a simpler way with minimal change.
@eriknw you should run a benchmark against spearmint, which is an implementation of the paper I linked to. I think they have a distributed implementation, though that might not be open source.
Thanks for the suggestion @amueller and letting me know they have a distributed implementation. I plan to perform comparative benchmarks against a couple notable packages and will definitely include spearmint
.
I plan to implement adaptive step sizes in dask-patternsearch
before putting much effort into benchmarks. Medium term, I would like to allow other optimizers to be easily used in combination with dask-patternsearch
. Pattern search optimization readily allows "oracles" to propose trial points, and the method you linked to should be an excellent candidate for an oracle. Longer term, I'd like to see a multi-armed bandit approach to perform real-time resource allocation to each optimizer (i.e., to give more cores to methods most likely to give the most improvement).
I've had a rigorous travel schedule the past month, but I'll continue working on dask-patternsearch
soon and should have it reasonably functional and presentable in time for the SciPy conference. I expect we can get dask-searchcv
to use dask-patternsearch
some time this summer.
It would be interesting to start exploring asynchronous algorithms within this project using the dask.distributed API. Because this API is somewhat different it might be wise to start with something simple.
One simple application would be to build a variant of RandomSearchCV that, instead of taking a number of candidates to try, instead took a stopping criterion like "have tried 100 options and not improved more than 1%" and then continued submitting computations while this has not been met.
My initial approach to do this would be to periodically check the number of cores I had
and try to keep roughly twice that many candidates in flight
Then I would consume those futures as they finished
and then submit new futures as necessary
If we wanted to be cool, we could also check the number of cores periodically and submit more or less accordingly.
cc @jcrist @amueller