Add support for CUDA streams.

LLNL / zfp

Compressed numerical arrays that support high-speed random access

http://zfp.llnl.gov

BSD 3-Clause "New" or "Revised" License

771 stars 158 forks source link

Add support for CUDA streams. #133

Open corbett5 opened 3 years ago

corbett5 commented 3 years ago

I have a project where we compute a time step on the GPU and then asynchronously copy some data back to the host for later use. This copy overlaps with the subsequent time step which saves a ton of time. Now I need to compress the data that we save, which I plan to do on device before copying it back to the CPU. It would be nice if this compression could also be asynchronous so I could overlap it with other computation.

lindstro commented 3 years ago

I think in principle what you're asking for would be possible via CUDA streams (I must confess to not knowing much about it), but I'm unsure how we would expose such functionality through the zfp API. Currently the only entry point we provide is through zfp_compress(), which does a fair amount of setup work on the CPU and handles any data motion between CPU and GPU. The actual CUDA compression kernel is launched some six levels deep.

Let me discuss this with our CUDA experts to see what can be done.

lindstro commented 3 years ago

I ran across this paper that seems to have tackled this problem. Not sure if their code is available.

data-panda commented 2 years ago

@lindstro was this something that got a place in this release (1.0.0; release notes does not mention so)? If not, is this in works for the release later this year?

lindstro commented 2 years ago

@data-panda No, this release does not include the latest CUDA and HIP work we have been doing. That will end up in the next release. Regarding CUDA streams specifically, that is not yet something our team has looked at yet. We've had discussions with others who have looked at this (see this paper, for instance) and would welcome a contribution.

S-o-T commented 9 months ago

@lindstro could you please share current plans regarding CUDA support in zfp? Specifically, i am interested in:

user control over CUDA stream to be used for encode/decode kernel enqueuing
fixed precision/accuracy and lossless compression modes

lindstro commented 9 months ago

We've yet to do any work on CUDA streams and lossless compression on the GPU. It is unlikely that either would make it into the next release. The next release will, however, have CUDA and HIP support for fixed-precision and -accuracy modes.

S-o-T commented 9 months ago

Thanks! Can you share an ETA for next release?

lindstro commented 9 months ago

I've been horrible at predicting release dates in the past and am reluctant to give false hope. That said, we're on the hook to do a release no later than end of September. I expect and hope it will happen well before then.