DTolm / VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
MIT License
1.48k stars 88 forks source link

user managed staging memory buffer in transferDataFromCPU() and transferDataToCPU() #129

Closed anarkiwi closed 8 months ago

anarkiwi commented 10 months ago

Thanks so much for VkFFT!

I observe an overhead in these two functions, calling allocateBuffer() for staging memory and then free/destroying it with each call.

Would it be possible to have a variation, where the caller supplies their own staging buffer (that is then remapped as necessary within transferData())? I have experimented with this on Pi4 specifically, and it makes a significant difference allocating the staging memory buffer once on application startup.

Thanks,

DTolm commented 10 months ago

Hello, thanks for using VkFFT!

Sure, I will add an option to provide a user-defined staging buffer in VkGPU and in VkFFTApplication. However, transferDataFromCPU and transferDataToCPU are not part of the VkFFT - they are simple tools in the benchmark suite that were not meant to be used in any production codes, just as a reference to how simply transfer data.

Best regards, Dmitrii

DTolm commented 9 months ago

Hello,

I have added an option to provide the staging buffer. I haven't tested it thoroughly yet and haven't updated documentation about it, but it should be pretty straightforward (just provide the staging buffer and memory pointers to the configuration).

Best regards, Dmitrii

anarkiwi commented 9 months ago

That's really nice - thanks!

DTolm commented 8 months ago

It should be working now, if you have any other improvement ideas about the staging buffer - feel free to reopen the issue!