DTolm / VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
MIT License
1.52k stars 91 forks source link

Creating Multiple VkFFTApplication plans #174

Closed ShatrovOA closed 4 months ago

ShatrovOA commented 4 months ago

I should start by telling, that I am not that good in C. So forgive me if answer is obvious.

I am trying to run 3d DCT on a cluster with GPUs. VkFFT is used to perform 1d batched FFT. Then data is redistributed across GPUs and FFT on another dimension is launched.

I checked out VkFFT on a single direction. Everything was fine and I was very impressed by its performance. So I started implementing it i multiple directions. I iterate over dimensions and create different VkFFTApplication plan that have their own length of transform and number of batches.

Below is how i create VkFFTApplication object

void vkfft_create(int size, int how_many, int double_precision, VkFFTApplication *app_handle) {
  VkFFTConfiguration config = {};
  VkFFTApplication app = {};
  // Populating config values
  // ...
  VKFFT_CALL(initializeVkFFT(&app, config));
  *app_handle = app;
}

I store app_handle pointer and use it later to execute plan.

Everything works on first iteration over dimensions. But second iteration fails on a calloc call: https://github.com/DTolm/VkFFT/blob/master/vkFFT/vkFFT/vkFFT_AppManagement/vkFFT_InitializeApp.h#L1485 Program received SIGSEGV

It turns out that creating empty app structure like this VkFFTApplication app = {};, app has same address that it had on a previous iteration. So, basically it is trying to calloc already allocated data.

My question is: How should I initialize VkFFTApplication structure to make it point to different memory address every iteration?

Thanks!

DTolm commented 4 months ago

Hello,

creating app structure like this VkFFTApplication app = {}; makes it scope local - meaning it will be deallocated once vkfft_create call finishes. What you want is likely to call VkFFTApplication app = {}; outside the vkfft_create and then pass the pointer &app to vkfft_create. Or you can manually allocate memory for app in vkfft_create with VkFFTApplication* app = (VkFFTApplication*)calloc(1, sizeof(VkFFTApplication)); and free it later in vkfft_free call, for example.

Best regards, Dmitrii

ShatrovOA commented 4 months ago

Hello Dmitrii,

Thank you for your quick response. Second option really made it work.

I changed subroutine signature a bit:

void vkfft_create(int size, int how_many, int double_precision, VkFFTApplication **app_handle) {
  VkFFTConfiguration config = {};
  VkFFTApplication* app = (VkFFTApplication*)calloc(1, sizeof(VkFFTApplication));
  // Populating config values
  // ...
  VKFFT_CALL(initializeVkFFT(app, config));
  *app_handle = app;
}

and I can clearly see that app is pointing to different locations. Thank you.

I have another unrelated question. Are there any memory estimations of memory that VkFFT will allocate internally for M batches of DCT2 transforms of size N?

DTolm commented 4 months ago

For DCT2 the additional memory usage depends on the system size. If system fits in shared memory of a GPU (<4096, approximately) it will not use additional memory (regardless of M). for bigger sequences it depends if system is decomposable as small primes or can be done with Rader algorithm - then the additional size will be 2x the system size (M*N). If the system uses Bluestein's algorithm, the size will be 4x. Some small additional memory is used for twiddle factors (at least M times smaller).

ShatrovOA commented 4 months ago

Thanks for clarification. I'm closing this issue.