dask / dask-expr

BSD 3-Clause "New" or "Revised" License
79 stars 18 forks source link

Add unify-chunks draft to arrays #1101

Closed mrocklin closed 1 day ago

mrocklin commented 2 days ago

This makes this less broken (but not necessarily entirely not broken)

import dask_expr.array as da

x = da.random.random((10, 10), chunks=(8, 8))
y = da.random.random((10, 10), chunks=(5, 5))

z = x + y
z.pprint()
Elemwise: op=<built-in function add>
  Random: rng=<dask_expr.array.random.RandomState object at 0x120599a90> distribution='random_sample' size=(10, 10) chunks=(8, 8) args=() kwargs={}
  Random: rng=<dask_expr.array.random.RandomState object at 0x120599a90> distribution='random_sample' size=(10, 10) chunks=(5, 5) args=() kwargs={}
z.simplify().pprint()
Elemwise: op=<built-in function add>
  Random: rng=<dask_expr.array.random.RandomState object at 0x120599a90> distribution='random_sample' size=(10, 10) chunks=((5, 3, 2), (5, 3, 2)) args=() kwargs={}
  Random: rng=<dask_expr.array.random.RandomState object at 0x120599a90> distribution='random_sample' size=(10, 10) chunks=((5, 3, 2), (5, 3, 2)) args=() kwargs={}

This required a small change to dask.array.utils.assert_eq

diff --git a/dask/array/utils.py b/dask/array/utils.py
index 9fe33f91c..ee4ed6f95 100644
--- a/dask/array/utils.py
+++ b/dask/array/utils.py
@@ -202,8 +202,13 @@ def _not_empty(x):
     return x.shape and 0 not in x.shape

-def _check_dsk(dsk):
+def _check_dsk(x):
     """Check that graph is well named and non-overlapping"""
+    if hasattr(x, "simplify"):
+        x = x.simplify()
+
+    dsk = x.dask
+
     if not isinstance(dsk, HighLevelGraph):
         return

@@ -268,7 +273,7 @@ def _get_dt_meta_computed(
         assert x.dtype is not None
         adt = x.dtype
         if check_graph:
-            _check_dsk(x.dask)
+            _check_dsk(x)
         x_meta = getattr(x, "_meta", None)
         if check_chunks:
             # Replace x with persisted version to avoid computing it twice.
mrocklin commented 2 days ago

@phofl doesn't like this approach because Blockwise doesn't really mean blockwise any more. This could be fixed (but not necessarily by me) (not necessarily not by me either)

mrocklin commented 2 days ago

Other cases where this is called in current dask.array

phofl commented 1 day ago

thx