Open hameerabbasi opened 4 months ago
I think this topic will have to be addressed in v2024, as it's too big to be squeezed in v2023 which we're trying very hard to wrap up π
A few quick comments:
topic: lazy/graph
label, and https://data-apis.org/array-api/draft/design_topics/lazy_eager.htmlI think this topic will have to be addressed in v2024, as it's too big to be squeezed in v2023 which we're trying very hard to wrap up π
No pressure. π
Materialization via some function/method in the API that triggers compute would be the one thing that is possibly actionable. However, that is quite tricky. The page I linked above has a few things to say about it.
Thanks Ralf -- That'd be a big help indeed. Materializing an entire array as opposed to one element is something that should be a common API across libraries, IMHO, I changed the title to reflect that.
Cross linking https://github.com/data-apis/array-api/issues/728 as it may be relevant to this discussion.
Materializing an entire array as opposed to one element is something that should be a common API across libraries, IMHO,
Just wanted to point out that it may be common but not universal. For instance, ndonnx arrays may not have any data that can be materialized. Such arrays do have data types and shapes and enable instant ONNX export of Array API compatible code. ONNX models are serializable computation graphs that you can load later, and so these "data-less" arrays denote model inputs that can be supplied at an entirely different point in time (in a completely different environment).
There are some inherently eager functions like __bool__
where we just raise an exception if there is no materializable data, in line with the standard. Any proposals around "lazy" arrays collecting values should have some kind of escape hatch like this.
Background
Some colleagues and me were doing some work on
sparse
when we stumbled onto a limitation of the current Array API Standard, and @kgryte was kind enough to point out that it might have some wider implications than justsparse
, so it would be prudent to discuss it with other relevant parties within the community before settling on an API design to avoid fragmentation.Problem Statement
There are two notable things missing from the Array API standard today, which
sparse
, and potentially Dask, JAX and other relevant libraries might also need.sparse
, this would be the format of the sparse array (CRS
,CCS
,COO
, ...).sparse
/JAX might use this to build up kernels before running a computationPotential solutions
Overload the
Array.device
attribute and theArray.to_device
method.One option is to overload the objects returned/accepted by these to contain a device + storage object. Something like the following:
To materialize an array, one could use
to_device(default_device())
(possible after #689 is merged).Advantages
As far as I can see, it's compatible with how the Array API standard works today.
Disadvantages
We're mixing the concepts of an execution context and storage format, and in particular overloading operators in a rather weird way.
Introduce an
Array.format
attribute andArray.to_format
method.Advantages
We can get the API right, maybe even introduce
xp.can_mix_formats(...)
.Disadvantages
Would need to wait till the 2024 revision of the standard at least.
Tagging potentially interested parties: