DTolm / VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
MIT License
1.48k stars 88 forks source link

initializeVkFFT throws an exception for small-size convolutions #156

Closed CoffeeExterminator closed 4 months ago

CoffeeExterminator commented 4 months ago

Hello, I'm attempting to perform 1D convolution, but getting an exception on relatively small sized kernels (the specific size depends on the physical device) when calling initializeVkFFT for the convolution application. The exception seems to be caused by division by 0 in appendKernelConvolution (line 285 to be precise), but I cannot quite understand what's leading to it. I have found that, perhaps, the easiest way to reproduce the issue is to modify the script in sample 50 (sample_50_convolution_VkFFT_single_1d_matrix) for vkFFT by changing the kernel size to a smaller one. So, for example, for Intel(R) Iris(R) Xe Graphics configuration.size[0] = 1024 * 5 results in a successful convolution, while configuration.size[0] = 1024 * 4 or smaller leads to the exception I've described. For NVIDIA GeForce RTX 3050 Laptop GPU the acceptable size is 1024 * 7 with smaller sizes leading to an exception. There is also a possibility that I misunderstand some of the configuration parameters. If that's the case, can you please point me to the right direction? Providing the configuration just in case, even though, as I said, it's pretty much the same as sample 50.

configuration.FFTdim = 1; 
configuration.size[0] = 1024 * 4; // changed here
configuration.size[1] = 1;
configuration.size[2] = 1;

configuration.kernelConvolution = true;
configuration.coordinateFeatures = 9; 
configuration.normalize = 1;
uint64_t kernelSize = ((uint64_t)configuration.coordinateFeatures) * sizeof(float) * 2 * (configuration.size[0]) * configuration.size[1] * configuration.size[2];
resFFT = allocateBuffer(vkGPU, &kernel, &kernelDeviceMemory, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, VK_MEMORY_HEAP_DEVICE_LOCAL_BIT, kernelSize);
configuration.buffer = &kernel;
configuration.bufferSize = &kernelSize;
resFFT = initializeVkFFT(&app_kernel, configuration);
convolution_configuration = configuration;
convolution_configuration.kernelConvolution = false;
convolution_configuration.performConvolution = true;
convolution_configuration.symmetricKernel = false;
convolution_configuration.matrixConvolution = 3;
convolution_configuration.coordinateFeatures = 3;
convolution_configuration.kernel = &kernel;
uint64_t bufferSize = ((uint64_t)convolution_configuration.coordinateFeatures) * sizeof(float) * 2 * (convolution_configuration.size[0]) * convolution_configuration.size[1] * convolution_configuration.size[2];
resFFT = allocateBuffer(vkGPU, &buffer, &bufferDeviceMemory, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, VK_MEMORY_HEAP_DEVICE_LOCAL_BIT, bufferSize);
convolution_configuration.buffer = &buffer;
convolution_configuration.bufferSize = &bufferSize;
convolution_configuration.kernelSize = &kernelSize;
resFFT = initializeVkFFT(&app_convolution, convolution_configuration); // exception here
titovmaxim commented 4 months ago

Got the same problem. Any ideas?

DTolm commented 4 months ago

Hello,

Sorry, that was my bad - sometime during the rewrite of the read/write module I didn't copy all lines to the convolution module counterpart. The missing line is restored now on the develop branch and should fix this issue - it only affected small 1D convolutions. Thank you for pointing it out!

Best regards, Dmitrii

CoffeeExterminator commented 4 months ago

Thanks a lot.