dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

graph_metrics() raises and generate a dashboard internal error #7511

Open epizut opened 1 year ago

epizut commented 1 year ago

Here a little MCVE that breaks the Dashboard Groups page. The graph_metrics() function used by the Groups Dashboard page raises due to a missing key in total_dependencies.

Minimal Complete Verifiable Example:

import dask.distributed
cluster = dask.distributed.LocalCluster()
client = dask.distributed.Client(cluster)
client

Open the Dask diagnostic Dashboard and select the Groups view

dd1 = dask.datasets.timeseries()
# 'head' needs to be passed as a non-aligned dd via a keyword parameter in order to reproduce this specific bug
dd2 = dask.dataframe.map_partitions(lambda part, head: part, dd1, head=dd1.head(1, compute=False))
dd3 = dask.dataframe.map_partitions(lambda part, head: part, dd2, head=dd2.head(1, compute=False))
dd3.compute()

Here the console exception: gh2

Here is the dashboard exception: gh3

Anything else we need to know?: Removing .head(1, compute=False) still raises, it's simply here to illustrate my needs for the non-aligned keyword parameter.

Environment:

jrbourbeau commented 1 year ago

Thanks for the issue @epizut, I'm able to reproduce. As you mentioned, I think a workaround for the time being would be to not passing it through as a keyword

cc @eriknw for the dask.order.order connection (though it's possible that the underlying cause is elsewhere)