Open ichinii opened 4 weeks ago
@ichinii Could you isolate the problematic functions in a simple hallo world kernel?
If not please post the kernel that we see how you call the math functions.
Maybe a specialization of the alpaka::math functions for complex numbers could be using funcitonallity from cuComplex.h instead?
I do not remember exactly why we not use cuComplex.h
but most likely it was not compatible to C++ std::Complex
or we decided against it because of other reasons.
here is a little test program and the associated stack trace from cuda-gdb:
#include <alpaka/alpaka.hpp>
using Dim = alpaka::DimInt<1>;
using Idx = int32_t;
using Vec = alpaka::Vec<Dim, Idx>;
using Acc = alpaka::AccGpuCudaRt<Dim, Idx>;
using Queue = alpaka::Queue<Acc, alpaka::NonBlocking>;
struct Kernel {
template <typename TAcc>
ALPAKA_FN_ACC
void operator() (const TAcc& acc) const {
auto c = alpaka::Complex<float>(0, 0);
c = alpaka::sin(c);
}
};
int main([[maybe_unused]] int argc, [[maybe_unused]] char** argv) {
const auto N = static_cast<Idx>(1 << 10);
const auto platform = alpaka::Platform<Acc>();
const auto acc = alpaka::getDevByIdx(platform, 0);
auto d_q = alpaka::Queue<Acc, alpaka::Blocking>(acc);
alpaka::exec<Acc>(
d_q,
alpaka::getValidWorkDiv<Acc>(acc, N),
Kernel{}
);
alpaka::wait(d_q);
return 0;
}
[2/2] Linking CUDA executable alpaka_playground
terminate called after throwing an instance of 'std::runtime_error'
what(): /home/ich/playground/alpaka_playground/alpaka/include/alpaka/queue/cuda_hip/QueueUniformCudaHipRt.hpp(175) 'TApi::streamSynchronize(queue.getNativeHandle())' A previous API call (not this one) set the error : 'cudaErrorIllegalAddress': 'an illegal memory access was encountered'!
(cuda-gdb) where
#0 0x0000000000000010 in ?? ()
#1 0x00007fffbd2598f0 in alpaka::sin<float> (x=...) at /home/ich/playground/alpaka_playground/alpaka/include/alpaka/math/Complex.hpp:536
#2 0x00007fffbd2589b0 in Kernel::operator()<alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::integral_constant<unsigned long, 1ul>, int> > (this=0x7fffdbfffdb4, acc=...) at /home/ich/playground/alpaka_playground/src/main.cpp:15
#3 0x00007fffbd257bf0 in alpaka::detail::gpuKernel<Kernel, alpaka::ApiCudaRt, alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::integral_constant<unsigned long, 1ul>, int>, std::integral_constant<unsigned long, 1ul>, int><<<(1,1,1),(1024,1,1)>>> (threadElemExtent=...,
kernelFnObj=...) at /home/ich/playground/alpaka_playground/alpaka/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp:79
cmake \
-DCMAKE_BUILD_TYPE=Debug \
-Dalpaka_ACC_GPU_CUDA_ENABLE=ON \
-DCMAKE_CXX_COMPILER=g++-12 \
-DCMAKE_CUDA_ARCHITECTURES=52 \
-G "Ninja" ..
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_
The problem is
c = alpaka::sin(c);
because it must be
c = alpaka::sin(acc, c);
All math functions require the accelerator as first parameter.
The reason why this error happened is that we have implemented sin()
in the alpaka namepsace for Complex. I assume this is required for ADL.
The same file is showing that it is for host only
IMo the problem is that we defined our complex class within the alpaka`` namespace instead of
alpaka::internal. We could move Complex into a namespace which is saying from the name that it should not be used. Put an alias into the
alpaka namespace and move all complex host math function implementations into this
internal` namepspace which will allow ADL.
This will still allow the user to call the implementations directly but should avoid calling these functions by accident.
Using alpaka trigonometric functions with alpaka::Complex argument inside a cuda accelerated kernel, produces a weird crash. Happens on both
1.1.0
anddevelop
.call stack of cuda-gdb
from another project we can see that alpaka::cos seems to call a function that is completely out of scope. weird.
nvcc version
cmake command:
I see that alpaka uses the std implementation of those functions. When i try to use them in a cuda-only project, then i get an error because these function are not annotated with
__device__
.Am I doing something wrong here or do you maybe have a hint for us? We really would appreciate an implementation of complex numbers within our project. Maybe a specialization of the alpaka::math functions for complex numbers could be using funcitonallity from cuComplex.h instead?