dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.57k stars 718 forks source link

Improved layout for task graph in dashboard #3467

Open MichaelSchreier opened 4 years ago

MichaelSchreier commented 4 years ago

Over the weekend I played around with improving the visuals of the task graph in the online dashboard and was wondering if you were interested in me refining this for a PR.

Some examples:


To get these results I did the following

Any thoughts?

mrocklin commented 4 years ago

Thanks @MichaelSchreier ! This looks really good.

for a final implementation one could use the old method as an automatic fallback in case of larger graphs or allow selecting the method via a config entry

Performance here is my main concern. Some workloads have thousands of nodes and they update their structure incrementally several times per second. This is hard both on the server (with things like layout) and on the client (browsers end up being a bottleneck when you have thousands of elements on the screen). Fun fact: we actually chose to use squares rather than circles because we found that it was easier for browsers to draw.

So we could do something like this, but we would need to make sure that we transitioned smoothly to the less-pretty and more efficient form when necessary.

I haven't yet taken a look at the implementation, but wanted to give you a quick response with high level feedback.

MichaelSchreier commented 4 years ago

@mrocklin I haven't attached any code yet - so far it's a hacky sunday afternoon implementation...

If performance is the key factor here then a mostly browser based solution (layout & drawing) sounds like the best approach in the long run - moving all the heavy lifting to outside the critical parts of dask.

(both solutions provide the "Sugiyama-style" layouts best suited to dask graphs)

Still, layouting thousands of nodes does not come for free and I highly doubt that any solution could deliver the performance needed in that case.

I'm also not sure if mix-matching properly layouted graphs with the existing solution is really better than just letting the user decide to use one or the other method as desired. In my use case above for instance the graph does not change for two hours or so. That means time to layout the nodes is almost irrelevant.

If you consider custom graphs and small scale problems (where I see this bring the most benefit) to be too much of a niche application of dask then I'll skip the PR and just describe the changes required for folks to implement it themselves if needed - it's just a couple dozen lines of code after all.

I probably don't have the time to work on a full on browser based replacement as discussed above but could refine my approach as-is as an optional setting and submit a PR. Tough that could leave you with a code fragment that you do not want to support in the long run.

mrocklin commented 4 years ago

Still, layouting thousands of nodes does not come for free and I highly doubt that any solution could deliver the performance needed in that case.

Yeah, the layout system we have in Dask, while not pretty, is actually decently performant. I would be a little surprised to see other systems beat it significantly.

I'm also not sure if mix-matching properly layouted graphs with the existing solution is really better than just letting the user decide to use one or the other method as desired

Unfortunately I think that most of our users will never be aware of this option. I agree that offering the option of control is good, but we also need to have sensible defaults that keep them out of trouble.

In my use case above for instance the graph does not change for two hours or so. That means time to layout the nodes is almost irrelevant

Understood. A lot of the design decisions we make are suboptimal, but robust. The only way I see this having an impact is if Dask chooses to use it when we have relatively few tasks and things change relatively infrequently. Otherwise I think that it's likely to break a non-trivial amount of user workloads.

However, there is another graph that we keep track of in the TaskGroups that is generally much smaller, and could probably use a layout engine closer to this. (cc @jrbourbeau ). This might be useful there.

If you consider custom graphs and small scale problems (where I see this bring the most benefit) to be too much of a niche application of dask then I'll skip the PR and just describe the changes required for folks to implement it themselves if needed - it's just a couple dozen lines of code after all.

It's not a niche use case. It's a very common case. But we probably can't optimize for it if it means breaking some other common cases.

Personally, I think that the way forward here is to find some way to have things automatically transition smoothly, or else to drop this for now.