DTolm / VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
MIT License
1.48k stars 88 forks source link

How to reduce the initialization time? #137

Open zhaohaifei opened 9 months ago

zhaohaifei commented 9 months ago

The initialization time of plan is too long. How to reduce this time? For example, can the compiled kernel be saved? I saw the saveApplicationToString option, which can save the entire plan. Is there any other way?

DTolm commented 9 months ago

Hello,

No, there is no other way. saveApplicationToString only saves the binaries, not the plan and this is precisely what you are looking for.

Best regards, Dmitrii

zhaohaifei commented 9 months ago

Is it possible to save the internally compiled kernel function in a directory so that it can be used directly next time without compiling it again? Is such a solution feasible in vkfft? Is it difficult?

DTolm commented 9 months ago

Hello,

saveApplicationToString saves the binaries, which you can later load with loadApplicationFromString configuration option. See pages 64-65 of the documentation.

Best regards, Dmitrii

zhaohaifei commented 9 months ago

save-application cannot adapt to all situations. I want to save the internal kernel, not the application. I compile all the kernels in advance, and any subsequent application that requires the same kernel can be used directly without compiling again.

DTolm commented 9 months ago

I am sorry, I don't understand. Which situations it can't adapt to? Please provide an example configuration. From what I read, this functionality is exactly what you want to do.

zhaohaifei commented 9 months ago

I want to implement a function such that any size and any stride can be quickly initialized. My approach is to prepare all the kernels in advance and put them in a directory. When you use it later, you don't need to compile it again. It is impossible to save all apps, there are too many apps. But the internal kernel is universal and limited, and can be saved in advance.

DTolm commented 9 months ago

The internal kernel is not universal and can't be saved in advance. The thing that you call kernel is a sequence of CPU calls that create the code for a particular FFT and compile it later.

zhaohaifei commented 8 months ago

I extracted the generic kernel and compiled it in advance. Subsequent can call directly without compilation at runtime. Can such a feature be achieved by modifying the code? If possible, I'll try to try it.

DTolm commented 8 months ago

No, it is not possible to create an uberkernel that will work for all system configurations - it will require a big redesign of the library and won't work with all the algorithms.

zhaohaifei commented 8 months ago

Just for HIP. And it’s ok if it can cover most of the algorithms. Can such a feature be implemented?

DTolm commented 8 months ago

This feature is not on the radar of my development, as it will require too much time to implement for no particular benefits. If you want to experiment with it - you are free to do so.

zhaohaifei commented 8 months ago

As long as it can be achieved and it takes no more than one month, that I will give it a try.