Improved layout for task graph in dashboard

MichaelSchreier commented 4 years ago

Over the weekend I played around with improving the visuals of the task graph in the online dashboard and was wondering if you were interested in me refining this for a PR.

Some examples:

The custom graph from the documentation becomes

The code

from distributed import Client, LocalCluster
import dask.array as da

cluster = LocalCluster()
client = Client(cluster)
x = da.ones((1500, 1500), chunks=(300, 300))

y = x + x.T

client.compute(y)

becomes dask_graph_1

A complex custom graph from my project at work transforms from this to this

To get these results I did the following

slightly tuned the colors to more eye friendly versions (I just noted that #2711 exists) - loosely following this wikipedia visualization
replaced the straight edges with bezier curves
added arrow heads to the edges
removed the axes as they don't really add anything to the visualization
replaced the update_graph() method in graph_layout.py with a version leveraging on the Grandalf package to compute a nicer layout
- my initial impression is that the overhead caused by proper layouting is not an issue for graphs not larger than few hundred nodes
- for a final implementation one could use the old method as an automatic fallback in case of larger graphs or allow selecting the method via a config entry
- dask already has graphviz as an optional dependency and it might make sense to use this instead to compute the node coordinates (though it's a heavier dependency, whereas Grandalf is self contained, pure Python)
- it might also make sense to switch to a pure client side solution (e.g. based on d3.js or better yet something canvas or webgl #2976 based)

Any thoughts?

mrocklin commented 4 years ago

Thanks @MichaelSchreier ! This looks really good.

for a final implementation one could use the old method as an automatic fallback in case of larger graphs or allow selecting the method via a config entry

Performance here is my main concern. Some workloads have thousands of nodes and they update their structure incrementally several times per second. This is hard both on the server (with things like layout) and on the client (browsers end up being a bottleneck when you have thousands of elements on the screen). Fun fact: we actually chose to use squares rather than circles because we found that it was easier for browsers to draw.

So we could do something like this, but we would need to make sure that we transitioned smoothly to the less-pretty and more efficient form when necessary.

I haven't yet taken a look at the implementation, but wanted to give you a quick response with high level feedback.

MichaelSchreier commented 4 years ago

@mrocklin I haven't attached any code yet - so far it's a hacky sunday afternoon implementation...

If performance is the key factor here then a mostly browser based solution (layout & drawing) sounds like the best approach in the long run - moving all the heavy lifting to outside the critical parts of dask.

D3-dag for instance does everything needed, though svg based solutions typically perform poorly with many individual objects
cytoscape.js-dagre is canvas based and should scale much better

(both solutions provide the "Sugiyama-style" layouts best suited to dask graphs)

Still, layouting thousands of nodes does not come for free and I highly doubt that any solution could deliver the performance needed in that case.

I'm also not sure if mix-matching properly layouted graphs with the existing solution is really better than just letting the user decide to use one or the other method as desired. In my use case above for instance the graph does not change for two hours or so. That means time to layout the nodes is almost irrelevant.

If you consider custom graphs and small scale problems (where I see this bring the most benefit) to be too much of a niche application of dask then I'll skip the PR and just describe the changes required for folks to implement it themselves if needed - it's just a couple dozen lines of code after all.

I probably don't have the time to work on a full on browser based replacement as discussed above but could refine my approach as-is as an optional setting and submit a PR. Tough that could leave you with a code fragment that you do not want to support in the long run.

mrocklin commented 4 years ago

Still, layouting thousands of nodes does not come for free and I highly doubt that any solution could deliver the performance needed in that case.

Yeah, the layout system we have in Dask, while not pretty, is actually decently performant. I would be a little surprised to see other systems beat it significantly.

I'm also not sure if mix-matching properly layouted graphs with the existing solution is really better than just letting the user decide to use one or the other method as desired

Unfortunately I think that most of our users will never be aware of this option. I agree that offering the option of control is good, but we also need to have sensible defaults that keep them out of trouble.

In my use case above for instance the graph does not change for two hours or so. That means time to layout the nodes is almost irrelevant

Understood. A lot of the design decisions we make are suboptimal, but robust. The only way I see this having an impact is if Dask chooses to use it when we have relatively few tasks and things change relatively infrequently. Otherwise I think that it's likely to break a non-trivial amount of user workloads.

However, there is another graph that we keep track of in the TaskGroups that is generally much smaller, and could probably use a layout engine closer to this. (cc @jrbourbeau ). This might be useful there.

If you consider custom graphs and small scale problems (where I see this bring the most benefit) to be too much of a niche application of dask then I'll skip the PR and just describe the changes required for folks to implement it themselves if needed - it's just a couple dozen lines of code after all.

It's not a niche use case. It's a very common case. But we probably can't optimize for it if it means breaking some other common cases.

Personally, I think that the way forward here is to find some way to have things automatically transition smoothly, or else to drop this for now.

dask / distributed

Improved layout for task graph in dashboard #3467