Closed jrueb closed 1 year ago
This code ensures that the blockwise layer has the same metadata as the collection it represents. Unfortunately the order of the layer names in the dask highlevel graph are not necessarily descending in order of compute! We were making an incorrect assumption so the last layer in the collection's task graph had a metadata that was different from the collection's metadata. We need the collection's metadata to be the same as the blockwise layer's metadata for the optimization. Should be fixed in #285
Continuing from #272. There is one remaining issue with the dak.mask optimization.
Optimization of dak.mask when the mask is made using
from_awkward
will fail. The error isValueError: mask must have boolean type, not dtype('float64')
or some similar ValueError (exact error depends on the type of the input array) and comes fromawkward.mask
.To reproduce:
The cause is the following code https://github.com/dask-contrib/dask-awkward/blob/b765938afc963a4e5b8a50f85e59a39c55dfcf21/src/dask_awkward/lib/core.py#L488-L493
This will set the type of the layer for
from_awkward
of the mask (which is a bool array, which a specific compatible structure) to the output ofmask
(which could be much more). When awkward sees this incorrect type of mask, it complains. Remove the above code makes it working.