dask / dask

Parallel computing with task scheduling
https://dask.org
BSD 3-Clause "New" or "Revised" License
12.55k stars 1.71k forks source link

construction of and interactions between different types of nested duck arrays #6635

Open keewis opened 4 years ago

keewis commented 4 years ago

Part of the discussion in #5329. I'm opening this issue to make the discussion on this topic a bit more focused.

As mentioned in https://github.com/dask/dask/issues/5329#issuecomment-691992501, we need to find a way to

Parts of this have already been discussed in #6393.


To solve both of these issues, the type hierarchy from hgrecco/pint#845 could be used, but we'd still need to figure out how to compare within that hierarchy, and we'd probably have to maintain a package that collects the relationships between different packages.

Similarly, we could have duck arrays maintain a list of duck arrays they can wrap. This is still pretty static and might grow too much for packages that are fairly high in the type hierarchy, but would allow to granularly control the interaction with other duck arrays.

In #6393, it was suggested to divide duck arrays into categories and then have duck arrays in categories with a higher number take care of those in a category with a lower number. However, this breaks down as soon as you have duck arrays that belong to multiple categories (or have two duck arrays that belong to the same category wrap each other), and adding new categories is difficult (at least for numeric numbers).

Using that idea, we could have duck arrays declare a tuple of categories they belong to and then a tuple of categories they can wrap. We could then compute a set operation to decide which is wrapped, but this still breaks down for coarse categories (duck arrays in the same category wrapping each other) and circular graphs (i.e. A is in categories x and z and can wrap y while B is in category y and can wrap x) – not sure if that's an issue?

keewis commented 3 years ago

another way to get this to work is to explicitly register types that can be wrapped by a certain duck array (see e.g. dask's implementation from #6393).

This is probably the simplest and most explicit way, but we still need to figure out where to put the actual registration (however, since we don't have that many duck arrays, yet, it might be fine to ignore this for now).

cc @jthielen, @TomNicholas, @shoyer, @amcnicho (if I remember correctly you were interested in this?)

jthielen commented 3 years ago

Discussion on this topic (and hopefully converging efforts towards a resolution) is welcome at https://github.com/pydata/duck-array-discussion/issues/3!