Open jakirkham opened 2 months ago
@jakirkham slightly unrelated: what would it take to bring KvikIO to conda-forge? More generally, I remember there was a plan to move packages from the rapidsai channel to conda-forge back in the day (or maybe I am misremembering?) Searching for "KvikIO" on anaconda org: https://anaconda.org/search?q=KvikIO
FWICS it's in 2.5.1 already. I guess I'll try tackling that next.
Assuming a user's workflow is able to be run entirely on the GPU, the remaining pieces that can be slow are file IO. Of course file IO can be slow for its own reasons. However the thing of interest here is it is slow to read into host memory and then transfer to device (especially if this is a big chunk of data)
To address this, NVIDIA rolled out GPUDirect Storage (publicized in this blogpost and covered in these docs). Basically the idea is to go directly from file IO to GPU memory (or back) bypassing host memory (and thus the expensive transfer cost associated). There is a bit of setup to get this to work, but it can be valuable for larger data workflows
To see this in action, would recommend reading this blogpost about using Xarray and KvikIO (a RAPIDS library leveraging GPUDirect Storage) to load Zarr-based climate data into Xarray (using CuPy on the backend)
Recently PyTorch has started adding support for GPUDirect Storage with PR ( https://github.com/pytorch/pytorch/pull/133489 ). This is not in any release yet. Though it is already merged
Once it is in a release, we could enable this here by adding
libcufile-dev
torequirements/host
and setting the CMake optionUSE_CUFILE
to1
So for now simply raising awareness about this upcoming feature