dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

spark-style accumulators #1582

Open martindurant opened 6 years ago

martindurant commented 6 years ago

It would be easy to modify/derive from the existing Variables to become race-safe counters or sum objects like Sparks accumulators. The only set method would be on initialization, thereafter only get and add-like methods would be allowed. The latter could be made syntactically nice with in-place add.

I'm not sure how much of a demand there is for this, but it seems useful.

Note that we would probably not be able to protect against tasks that run more than once for some reason, i.e., the add method should be considered a side-effect.

mrocklin commented 6 years ago

Yes, one could do this. One could also build new things like Variable or Queue fairly easily that were more specific to specific needs. Or more general versions of them that could be more easily modified. I am not certain that they are fundamental building blocks on which other such objects should be based.

I personally have no intention of pursuing this without concrete use cases and active users.

martindurant commented 6 years ago

I think it's as simple as adding an add method/route, which changes the value in-place, to the existing Variable (later, perhaps, providing the form of the addition/update function). Then the same class can play both spark roles of broadcast-variable or accumulator, depending on how it is used.