Open robin-cls opened 1 month ago
Hello,
Your analyse is correct.
The problem is the following one:
import distributed as dist
import numpy
if __name__ == '__main__':
cluster = dist.LocalCluster(processes=False)
client = dist.Client(cluster)
scattered = client.scatter(numpy.ones(2), broadcast=True)
def callback(arg_future):
print(arg_future)
def wrap_update_func(func, *args, **kwargs):
def _wrapped_function() -> None:
func(*args, **kwargs)
return _wrapped_function
def do_something(func, *args):
client = dist.get_client()
local_func = wrap_update_func(func, *args)
futures = client.submit(local_func)
client.compute(futures, sync=True)
do_something(callback, scattered)
The wrapped function wraps the parameters and dask loses track of them. This problem appeared with the 2024.2 release and I think this is similar to what they discussed in this issue.
I'm pushing something that should fix it with almost no side effect.
Something like this approach:
import distributed as dist
import numpy
if __name__ == '__main__':
cluster = dist.LocalCluster(processes=False)
client = dist.Client(cluster)
scattered = client.scatter(numpy.ones(2), broadcast=True, hash=False)
def callback(arg_not_a_future):
print(arg_not_a_future)
def wrap_update_func(func):
def _wrapped_function(*args, **kwargs) -> None:
func(*args, **kwargs)
return _wrapped_function
def do_something(func, *args):
client = dist.get_client()
local_func = wrap_update_func(func)
futures = client.submit(local_func, *args)
client.compute(futures, sync=True)
do_something(callback, scattered)
Regarding your usage you'll have to adapt your callback
function.
It won't receive a future so you do not have to call .result()
on your parameters.
Hi,
I recently bumped dask in my conda environment and zcollection.Collection.update now gives a RuntimeError when trying to access a Future object in the callback. This error appears with dask=2024.9.0, but not with older versions
Below is the code to reproduce the problem. It works using a local cluster and a zcollection in memory
My preliminary analysis is that the underlying wrapper for the update() function stores the *args and *kwargs arguments. This might give dask troubles for serialization/deserialization because the Future contained in args is not directly submitted by the client.