Closed lospampa closed 2 weeks ago
As a quick check are you able to run the following example: https://github.com/LLNL/RAJA/blob/develop/examples/tut_matrix-multiply.cpp ? An do you see the CUDA example running? If not, RAJA may have not been configured correctly.
I found that the problem was when compiling. I don't know exactly why it is compiling now. I have included the next line in the Makefile. It can be because of the forward or the "-x cu".
"usr/local/cuda-12.4/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/scratch/aflorenzon/llvm/build/bin/clang++ -x cu" Thank you for your time.
Gotcha, good find. I'll close this issue then if things are resolved. Please feel free to reach out with any other questions.
@lospampa I do have one additional suggestion. Although I'm not familiar with the structure of your kernel, it may be more performant if you consider not tiling and instead using global thread id's such as cuda/hip_global_x_direct -- see the documentation: https://raja.readthedocs.io/en/develop/sphinx/user_guide/feature/policies.html#raja-loop-kernel-execution-policies
Also it may be the case we are missing example for this...
@artv3 , thank you for your help. I will try and let you know when I have the results.
Hi there, I am trying to compile the following code to run on an NVIDIA RTX 4090 GPU.
using KERNEL_POL = RAJA::KernelPolicy< RAJA::statement::CudaKernel< RAJA::statement::Tile<1, RAJA::tile_fixed, RAJA::cuda_block_y_loop,
RAJA::statement::Tile<0, RAJA::tile_fixed, RAJA::cuda_block_x_loop,
RAJA::statement::For<1, RAJA::cuda_thread_y_direct,
RAJA::statement::For<0, RAJA::cuda_thread_x_direct,
RAJA::statement::Lambda<0>
But I am receiving the following errors: error: ‘CudaKernel’ is not a member of ‘RAJA::statement’ 18 | RAJA::statement::CudaKernel< | ^
~~~~~Do you know what is happening here? The application is parallelized with Kernels (it works when using HIP on the AMD platform).
Thank you for the attention.