eriknw / dask-patternsearch

Scalable pattern search optimization with dask
BSD 3-Clause "New" or "Revised" License
21 stars 2 forks source link

Serial execution #12

Open eriknw opened 7 years ago

eriknw commented 7 years ago

I think we should support executing the algorithm serially. This is well-defined and well-behaved: put trial points on a queue, and evaluate them in FIFO order.

Maybe we can do this by creating a SerialClient that mimics the API of dask.distributed.Client. Perhaps client=None should become an optional keyword argument to search and we default to running serially.

mrocklin commented 7 years ago

It should be possible to run a dask.distributed client/scheduler/worker in a single thread. Additionally I think that Tornado has a concurrent.futures compatible executor that can be plugged into the worker's executor slot so that tasks are run on the same thread as well.

On Mon, Apr 10, 2017 at 1:42 PM, Erik Welch notifications@github.com wrote:

I think we should support executing the algorithm serially. This is well-defined and well-behaved: put trial points on a queue, and evaluate them in FIFO order.

Maybe we can do this by creating a SerialClient that mimics the API of dask.distributed.Client. Perhaps client=None should become an optional keyword argument to search and we default to running serially.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eriknw/dask-patternsearch/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszIGBWXd7D5r4Hgcd66rszMzeIFVtks5rumn0gaJpZM4M5GPy .

eriknw commented 7 years ago

Good to know. Would that run in the user's thread? Specifically, would low-tech debugging such as break points and %debug in IPython work?

When running serially, I would like to avoid unnecessary overhead.

mrocklin commented 7 years ago

The tornado event loop does add some overhead. My guess would be tens of microseconds per operation.

Yes, it is probably possible to make everything run in the main user thread. This might require a bit of Tornado know-how though.

On Mon, Apr 10, 2017 at 3:55 PM, Erik Welch notifications@github.com wrote:

Good to know. Would that run in the user's thread? Specifically, would low-tech debugging such as break points and %debug in IPython work?

When running serially, I would like to avoid unnecessary overhead.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eriknw/dask-patternsearch/issues/12#issuecomment-293060667, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszG0gltnvmQRpmTicKZ_69kgFdl6Yks5ruolMgaJpZM4M5GPy .

eriknw commented 7 years ago

Great. Integrating with a Tornado loop would allow for cooperative multitasking. Is there any other advantage versus writing a simple Client class that executes things in FIFO order?

Part of the appeal of serial execution is it removes the burden of having a cluster, dask.distributed.Client, or any sort of multiprocess executor. If a client isn't specified, I would like for this algorithm to feel familiar to those in scipy.optimize. IMHO, this includes having an efficient core loop.

mrocklin commented 7 years ago

If you want single-process execution but are comfortable with threads, then you can have this now with

Client(processes=False)  # in master

Some things like functions do get serialized, but everything moves around in in-memory queues rather than sockets. This uses a threadpool so %debug won't work.

For sequential execution I can see advantages both ways. If you use the full system with the tornado Executor then you'll have diagnostics and such. Your users will also be running on the same logic that the full system uses, so they'll experience the same set of errors and such that they would experience running on larger systems. (this is both an advantage and a disadvantage). There will be a performance hit due to all of the scheduling (hundreds of microseconds per task).

eriknw commented 7 years ago

Thanks for the discussion. I think there's enough benefit to support both ways.

eriknw commented 7 years ago

13 gives us a client API to use within dask-patternsearch (it also implements the simple serial client). If I understand correctly, though, one can get the behavior you want by creating dask.distributed.Client in different ways. This probably belongs in the (non-existent) docs.