dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

Submitting tasks from tasks #1424

Open remram44 opened 7 years ago

remram44 commented 7 years ago

I have seen the effort in #432, and the possibility for tasks to use get_client() to use dask themselves. However this is awkward as it requires the task-submitting-task to stay alive just to wait for results and pass them downstream.

Wouldn't it make sense for a task to be able to return Future or Delayed object to signify to dask that the result of those computations is the result of the task? Currently the Delayed object is simply passed through to the original caller.

Example weirdness (and this is in the simple case where each task only creates a single task): ```python def fibo(n): if n <= 1: return 1 delay = delayed(fibo)(n - 1) + delayed(fibo)(n - 2) return delay result = client.compute(delayed(fibo)(5)).result() while isinstance(result, Delayed): # Got to keep executing until we get the actual result result = client.compute(result).result() result ```

Apologies if this has been discussed in the past, I couldn't find this exact proposal anywhere. It is a pretty common pattern in asynchronous programming for deferred computations to return futures themselves, so I was surprise not to find it in dask.

mrocklin commented 7 years ago

In general I agree that other mechanisms for task-spawning behavior could be developed to make complex workflows simpler. get_client was more-or-less the simplest way we could do this, so we did it first. Hopefully more use cases and attempts in this direction develop in the future.