Easy CPU/GPU Arrays and Dataframes

quasiben commented 1 year ago

cross posting from medium rapids blog

https://medium.com/rapids-ai/easy-cpu-gpu-arrays-and-dataframes-run-your-dask-code-where-youd-like-e349d92351d

cc @jacobtomlinson @rjzamora

GenevieveBuckley commented 1 year ago

Nice blogpost, and I appreciate the signal boost since I probably would have missed it on Medium 😄

I'd like if it also included an example on how to use a gpu backend for input IO of arrays.

I've tried the equivalent of the read_csv and read_parquet examples in the blogpost (eg: using from_zarr and from_tiledb instead) but the arrays from that clearly have numpy backed chunks.

Am I doing something wrong here? Or is IO not yet supported for arrays? If so, the phrase "users can optionally select the backend engine for input IO and data creation" should probably be modified to reflect that.

quasiben commented 1 year ago

@GenevieveBuckley thanks for reading it over. You're right, the from_* array support was not included. I just submitted a WIP https://github.com/dask/dask/pull/9914 though I imagine it could be done in a significantly better way

quasiben commented 1 year ago

In 0ca9b0c I added a note on the missing functionality for tiledb, zarr, and from_array

dask / dask-blog

Easy CPU/GPU Arrays and Dataframes #157