alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
337 stars 69 forks source link

alpaka::sin(alpaka::Complex) produces 'cudaErrorIllegalAddress': 'an illegal memory access was encountered'! #2285

Open ichinii opened 4 weeks ago

ichinii commented 4 weeks ago

Using alpaka trigonometric functions with alpaka::Complex argument inside a cuda accelerated kernel, produces a weird crash. Happens on both 1.1.0 and develop.

call stack of cuda-gdb

#0  0x0000000000000010 in ?? ()
#1  0x00007fffbd25aaf0 in alpaka::sin<float> (x=...) at /home/ich/playground/alpaka_playground/alpaka/include/alpaka/math/Complex.hpp:536
#2  0x00007fffbd258f30 in Kernel::operator()<alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::integral_constant<unsigned long, 1ul>, int> > (this=0x7fffdbfffc6c, acc=..., f=0x7fffb3600000) at /home/ich/playground/alpaka_playground/src/main.cpp:19
#3  0x00007fffbd257db0 in alpaka::detail::gpuKernel<Kernel, alpaka::ApiCudaRt, alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::integral_constant<unsigned long, 1ul>, int>, std::integral_constant<unsigned long, 1ul>, int, float*><<<(1,1,1),(1024,1,1)>>> (threadElemExtent=..., 
    kernelFnObj=..., args=0x7fffb3600000)
    at /home/ich/playground/alpaka_playground/alpaka/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp:79

from another project we can see that alpaka::cos seems to call a function that is completely out of scope. weird.

#0  0x0000000000000010 in RAYX::findCollisionInElementCoords (r=..., surface=..., cutout=..., isTriangul=<unavailable>)
    at /home/ich/projects/rayx/Intern/rayx-core/src/Shader/Collision.cpp:394
#1  0x00007fffbd36a110 in alpaka::cos<double> (x=...) at /home/ich/projects/rayx/Extern/alpaka/include/alpaka/math/Complex.hpp:445   -G "Ninja" $HOME/projects/rayx

nvcc version

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

cmake command:

cmake \
    -DCMAKE_BUILD_TYPE=Debug \
    -Dalpaka_ACC_GPU_CUDA_ENABLE=ON \
    -DCMAKE_CXX_COMPILER=g++-12 \
    -DCMAKE_CUDA_ARCHITECTURES=52 \
    -G "Ninja" ..

I see that alpaka uses the std implementation of those functions. When i try to use them in a cuda-only project, then i get an error because these function are not annotated with __device__.

main.cu(15): error: calling a __host__ function("std::complex<T1>   ::std::sin<float> (const     ::std::complex<T1>  &)") from a __global__ function("kernel") is not allowed
main.cu(15): error: identifier "std::sin<float> " is undefined in device code

Am I doing something wrong here or do you maybe have a hint for us? We really would appreciate an implementation of complex numbers within our project. Maybe a specialization of the alpaka::math functions for complex numbers could be using funcitonallity from cuComplex.h instead?

psychocoderHPC commented 4 weeks ago

@ichinii Could you isolate the problematic functions in a simple hallo world kernel?

If not please post the kernel that we see how you call the math functions.

psychocoderHPC commented 4 weeks ago

Maybe a specialization of the alpaka::math functions for complex numbers could be using funcitonallity from cuComplex.h instead?

I do not remember exactly why we not use cuComplex.h but most likely it was not compatible to C++ std::Complex or we decided against it because of other reasons.

ichinii commented 4 weeks ago

here is a little test program and the associated stack trace from cuda-gdb:

#include <alpaka/alpaka.hpp>

using Dim = alpaka::DimInt<1>;
using Idx = int32_t;
using Vec = alpaka::Vec<Dim, Idx>;

using Acc = alpaka::AccGpuCudaRt<Dim, Idx>;
using Queue = alpaka::Queue<Acc, alpaka::NonBlocking>;

struct Kernel {
    template <typename TAcc>
    ALPAKA_FN_ACC
    void operator() (const TAcc& acc) const {
        auto c = alpaka::Complex<float>(0, 0);
        c = alpaka::sin(c);
    }
};

int main([[maybe_unused]] int argc, [[maybe_unused]] char** argv) {
    const auto N = static_cast<Idx>(1 << 10);
    const auto platform = alpaka::Platform<Acc>();
    const auto acc = alpaka::getDevByIdx(platform, 0);
    auto d_q = alpaka::Queue<Acc, alpaka::Blocking>(acc);

    alpaka::exec<Acc>(
        d_q,
        alpaka::getValidWorkDiv<Acc>(acc, N),
        Kernel{}
    );

    alpaka::wait(d_q);

    return 0;
}
[2/2] Linking CUDA executable alpaka_playground
terminate called after throwing an instance of 'std::runtime_error'
  what():  /home/ich/playground/alpaka_playground/alpaka/include/alpaka/queue/cuda_hip/QueueUniformCudaHipRt.hpp(175) 'TApi::streamSynchronize(queue.getNativeHandle())' A previous API call (not this one) set the error  : 'cudaErrorIllegalAddress': 'an illegal memory access was encountered'!
(cuda-gdb) where
#0  0x0000000000000010 in ?? ()
#1  0x00007fffbd2598f0 in alpaka::sin<float> (x=...) at /home/ich/playground/alpaka_playground/alpaka/include/alpaka/math/Complex.hpp:536
#2  0x00007fffbd2589b0 in Kernel::operator()<alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::integral_constant<unsigned long, 1ul>, int> > (this=0x7fffdbfffdb4, acc=...) at /home/ich/playground/alpaka_playground/src/main.cpp:15
#3  0x00007fffbd257bf0 in alpaka::detail::gpuKernel<Kernel, alpaka::ApiCudaRt, alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::integral_constant<unsigned long, 1ul>, int>, std::integral_constant<unsigned long, 1ul>, int><<<(1,1,1),(1024,1,1)>>> (threadElemExtent=..., 
    kernelFnObj=...) at /home/ich/playground/alpaka_playground/alpaka/include/alpaka/kernel/TaskKernelGpuUniformCudaHipRt.hpp:79
cmake \
    -DCMAKE_BUILD_TYPE=Debug \
    -Dalpaka_ACC_GPU_CUDA_ENABLE=ON \
    -DCMAKE_CXX_COMPILER=g++-12 \
    -DCMAKE_CUDA_ARCHITECTURES=52 \
    -G "Ninja" ..
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_
psychocoderHPC commented 4 weeks ago

The problem is

c = alpaka::sin(c);

because it must be

c = alpaka::sin(acc, c);

All math functions require the accelerator as first parameter.

psychocoderHPC commented 4 weeks ago

The reason why this error happened is that we have implemented sin()

https://github.com/alpaka-group/alpaka/blob/c2a37cc8c9dd82b45a203318bf6000febc460f45/include/alpaka/math/Complex.hpp#L532-L537

in the alpaka namepsace for Complex. I assume this is required for ADL.

psychocoderHPC commented 4 weeks ago

The same file is showing that it is for host only

https://github.com/alpaka-group/alpaka/blob/c2a37cc8c9dd82b45a203318bf6000febc460f45/include/alpaka/math/Complex.hpp#L352-L366

psychocoderHPC commented 4 weeks ago

IMo the problem is that we defined our complex class within the alpaka`` namespace instead ofalpaka::internal. We could move Complex into a namespace which is saying from the name that it should not be used. Put an alias into thealpaka namespace and move all complex host math function implementations into thisinternal` namepspace which will allow ADL. This will still allow the user to call the implementations directly but should avoid calling these functions by accident.