dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

Remove duplication of `GraphLayout` scheduler plugin #5294

Open douglasdavis opened 3 years ago

douglasdavis commented 3 years ago

Following up on a discovery in https://github.com/dask/distributed/pull/5267: the GraphLayout plugin is duplicated in the scheduler plugins. The latest release uses uuid to generate a unique name for each instance to avoid the duplication warning; we should be able to remove this work around.

jrbourbeau commented 3 years ago

Thanks for tracking this @douglasdavis. Just checking in, do you have the bandwidth to look into this? No worries if not

douglasdavis commented 3 years ago

James Bourbeau @.***> writes:

Thanks for tracking this @douglasdavis. Just checking in, do you have the bandwidth to look into this? No worries if not

Yes! Will be digging in today

douglasdavis commented 3 years ago

Still digging but jotting down what I've tracked down so far.

TaskGraph.__init__ creates a GraphLayout instance and adds it to the scheduler plugins.

In distributed/dashboard/scheduler.py, the applications dictionary has two places where a TaskGraph is created:

https://github.com/dask/distributed/blob/baca13b795fd0ddef7823f3712aff25debb8b04b/distributed/dashboard/scheduler.py#L53 https://github.com/dask/distributed/blob/baca13b795fd0ddef7823f3712aff25debb8b04b/distributed/dashboard/scheduler.py#L60

graph_doc creates a TaskGraph instance and individual_doc creates an instance of the class passed as the first argument.

test_simple loops over that dictionary triggering the creation of two TaskGraph instances:

https://github.com/dask/distributed/blob/baca13b795fd0ddef7823f3712aff25debb8b04b/distributed/dashboard/tests/test_scheduler_bokeh.py#L62-L66

What you tried in #5267, to only create a GraphLayout plugin in TaskGraph.__init__ if scheduler.plugins didn't already contain a plugin with name graph-layout, is the first solution I thought of. But it looks like that wasn't a good fix. Will keep trying some alternatives.

douglasdavis commented 3 years ago

Since these are two different dashboard endpoints would it make sense to have two GraphLayout plugin instances? In that case we can just pass a kwarg to TaskGraph.__init__ to give its GraphLayout a name. @jrbourbeau what do you think?