NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.06k stars 615 forks source link

npz support #3346

Open RaananHadar opened 2 years ago

RaananHadar commented 2 years ago

npz format is basically a zipped npy. I've seen other users who are interested in this as well, so I hope you consider this as a long term enhancement request.

Why would this be useful? Using compressed numpy can save a lot of bandwidth on the network and do the de-compression on the gpu. Also, numpy is pretty standard.

PS. Its also worth noting another nvidia project: nvcomp that might help in enabling this similar to how you are using nvjpeg.

mnicely commented 2 years ago

Hi @RaananHadar , What exactly is your ask? Do you already have npz files that you want decompressed on GPU? Or do you want npz compression/decompression on GPU? What is your workflow?

JanuszL commented 2 years ago

Hi @RaananHadar,

We have thought about this use case but we haven't get into it yet. There are two aspects of your ask:

Still, this is a valid ask worth checking.

RaananHadar commented 2 years ago

Thank you for the consideration. It is obvious to me that this is a long term feature.

Re: @mnicely's request: the workflow I envision is both deep learning training and inference. This can be combined with GPU Storage Direct to load existing npz files from the network that were pre-processed as much as possible, uncompress them then feed them to a neural network in Pytorch or to Triton in the far future.

Using npz has really useful applications for image processing pipelines that have non standard data structures: ergo involving more than one image per sample. For example, I have an application where I want to feed pairs of images to a neural network or multiple modalities. So bundling all the arrays in one file makes sense as the network needs them ALL to run and bundling at runtime would cost precious CPU time. Now as npy does not have any compression... this will slow down IO and copying this over the CPU will not allow you to use GPU Storage Direct...