Open eriknw opened 7 years ago
It should be possible to run a dask.distributed client/scheduler/worker in a single thread. Additionally I think that Tornado has a concurrent.futures compatible executor that can be plugged into the worker's executor slot so that tasks are run on the same thread as well.
On Mon, Apr 10, 2017 at 1:42 PM, Erik Welch notifications@github.com wrote:
I think we should support executing the algorithm serially. This is well-defined and well-behaved: put trial points on a queue, and evaluate them in FIFO order.
Maybe we can do this by creating a SerialClient that mimics the API of dask.distributed.Client. Perhaps client=None should become an optional keyword argument to search and we default to running serially.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eriknw/dask-patternsearch/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszIGBWXd7D5r4Hgcd66rszMzeIFVtks5rumn0gaJpZM4M5GPy .
Good to know. Would that run in the user's thread? Specifically, would low-tech debugging such as break points and %debug
in IPython work?
When running serially, I would like to avoid unnecessary overhead.
The tornado event loop does add some overhead. My guess would be tens of microseconds per operation.
Yes, it is probably possible to make everything run in the main user thread. This might require a bit of Tornado know-how though.
On Mon, Apr 10, 2017 at 3:55 PM, Erik Welch notifications@github.com wrote:
Good to know. Would that run in the user's thread? Specifically, would low-tech debugging such as break points and %debug in IPython work?
When running serially, I would like to avoid unnecessary overhead.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eriknw/dask-patternsearch/issues/12#issuecomment-293060667, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszG0gltnvmQRpmTicKZ_69kgFdl6Yks5ruolMgaJpZM4M5GPy .
Great. Integrating with a Tornado loop would allow for cooperative multitasking. Is there any other advantage versus writing a simple Client class that executes things in FIFO order?
Part of the appeal of serial execution is it removes the burden of having a cluster, dask.distributed.Client
, or any sort of multiprocess executor. If a client isn't specified, I would like for this algorithm to feel familiar to those in scipy.optimize
. IMHO, this includes having an efficient core loop.
If you want single-process execution but are comfortable with threads, then you can have this now with
Client(processes=False) # in master
Some things like functions do get serialized, but everything moves around in in-memory queues rather than sockets. This uses a threadpool so %debug
won't work.
For sequential execution I can see advantages both ways. If you use the full system with the tornado Executor then you'll have diagnostics and such. Your users will also be running on the same logic that the full system uses, so they'll experience the same set of errors and such that they would experience running on larger systems. (this is both an advantage and a disadvantage). There will be a performance hit due to all of the scheduling (hundreds of microseconds per task).
Thanks for the discussion. I think there's enough benefit to support both ways.
dask-patternsearch
(it also implements the simple serial client). If I understand correctly, though, one can get the behavior you want by creating dask.distributed.Client
in different ways. This probably belongs in the (non-existent) docs.
I think we should support executing the algorithm serially. This is well-defined and well-behaved: put trial points on a queue, and evaluate them in FIFO order.
Maybe we can do this by creating a SerialClient that mimics the API of
dask.distributed.Client
. Perhapsclient=None
should become an optional keyword argument tosearch
and we default to running serially.