DTolm / VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
MIT License
1.48k stars 88 forks source link

how to dump the generated source kernel? #161

Open tingxingdong opened 4 months ago

tingxingdong commented 4 months ago

I can see a cubin/AMD binary dumped after runing the testSuite Portal. but i do not see the source kernel dumped. Where can i see the source kernel?

DTolm commented 4 months ago

Enable the keepShaderCode parameter in configuration and the test suite will print out all executed kernels.

tingxingdong commented 4 months ago

vim vkFFT/vkFFT/vkFFT_Structs/vkFFT_Structs.h and pfUINT keepShaderCode = 1;//will keep shader code and print all executed shaders during the plan execution in order (0 - off, 1 - on)

will cause the testSuite portal stops working and quickly return.

tingxingdong commented 4 months ago

vim benchmark_scripts/vkFFT_scripts/src/user_benchmark_VkFFT.cpp
add configuration.keepShaderCode = 1;

I do not see CUDA source kernel dumped under the folder.

where are them?

tingxingdong commented 4 months ago

i mean the CUDA/hip source kernel not the VkFFT_binary generated under the folder.

tingxingdong commented 4 months ago

 grep -r -i "keepShaderCode" *
benchmark_scripts/vkFFT_scripts/src/user_benchmark_VkFFT.cpp:                   configuration.keepShaderCode = 1;
benchmark_scripts/vkFFT_scripts/src/sample_14_precision_VkFFT_single_nonPow2.cpp:                       configuration.keepShaderCode = 1;
benchmark_scripts/vkFFT_scripts/src/sample_51_convolution_VkFFT_single_3d_matrix_zeropadding_r2c.cpp:   convolution_configuration.keepShaderCode = 1;
benchmark_scripts/vkFFT_scripts/src/sample_15_precision_VkFFT_single_r2c.cpp:                   configuration.keepShaderCode = 1;
benchmark_scripts/vkFFT_scripts/src/sample_16_precision_VkFFT_single_dct.cpp:                           configuration.keepShaderCode = 1;    
```  yet, still not see any kernel print out
DTolm commented 4 months ago

You need to modify the configuration struct in the example you try to execute, not in the struct definition. I suggest opening sample_11_precision_VkFFT_single.cpp for the power of 2 cases and doing it there.