data-apis / array-api-compat

Compatibility layer for common array libraries to support the Array API
https://data-apis.org/array-api-compat/
MIT License
76 stars 26 forks source link

Dask support #17

Closed asmeurer closed 8 months ago

asmeurer commented 1 year ago

Should we add Dask as a supported library that we wrap? See https://github.com/dask/dask/pull/8750#issuecomment-1438365157. CC @tomwhite @jakirkham

rgommers commented 1 year ago

That probably makes sense to do. And in principle I think that this package can add support for any widely used array library.

tomwhite commented 1 year ago

I think this is a good idea. The code in https://github.com/dask/dask/pull/8750 could be used as a guide - happy to help.

asmeurer commented 1 year ago

I probably won't prioritize work on this myself for now unless some downstream consumer library expresses a desire for Dask support here. But if you want to submit a pull request adding it I will be happy to review and merge it.

The primary purpose of this library is to wrap existing libraries so that they more closely match the array API. This is preferable to using a separate namespace in the library itself because end-users can continue to use the existing array objects, and only the corresponding consumer library (like scipy or scikit-learn) would need to use this compat layer to make use of the uniform array API on it.

ogrisel commented 1 year ago

For the record, as discussed in other PRs, I started experimenting with what would imply supporting dask in scikit-learn via the Array API spec.

array-api-compat support for dask would indeed simplify those experiments and make it possible for early adopters to try it before numpy's and dask's main APIs are fully aligned with the Array API spec by default.

See the discussion in: https://github.com/scikit-learn/scikit-learn/issues/26724

asmeurer commented 1 year ago

BTW I'll be running a sprint on the array API next week at the SciPy conference sprints. If any of you will be present this could be a thing to sprint on.

ogrisel commented 1 year ago

@asmeurer let me know if you give it a shot. It should be quite easy to adapt scikit-learn tests to also add dask.array to the the list of namespaces to test against:

https://github.com/scikit-learn/scikit-learn/blob/83a8015702213b39510a0f4898bc6879bcdf3ac2/sklearn/utils/_array_api.py#L13-L50

Then it's a matter of running:

pytest -k array_api sklearn/
asmeurer commented 1 year ago

I'm not sure when I'll be able to work on this issue, but if someone else wants to work on implementing it I'll be happy to review/help with questions.

jakirkham commented 1 year ago

cc @jrbourbeau (for awareness)

rgommers commented 10 months ago

This is moving now, xref gh-76.

asmeurer commented 8 months ago

Dask support has been merged. Quite a lot of functionality is skipped in the tests, which maybe could be improved with further wrappers, though most of it will either need to be fixed upstream, or can never be supported in Dask. People should open new issues if they run into any missing or broken behavior with Dask.