acts-project / algebra-plugins

Mozilla Public License 2.0
3 stars 10 forks source link

bug: Eigen CUDA test seems to fail in Debug build #122

Open niermann999 opened 4 months ago

niermann999 commented 4 months ago

Error message:

[ RUN      ] algebra_plugins/test_cuda_basics/cuda_eigen_eigen<float>.transform3
unknown file: Failure
C++ exception with description "/mnt/ssd1/jonierma/algebra-plugins/tests/accelerator/cuda/common/execute_cuda_test.cuh:55 Failed to execute: cudaDeviceSynchronize() (an illegal memory access was encountered)" thrown in the test body.

terminate called after throwing an instance of 'std::runtime_error'
  what():  /mnt/ssd1/jonierma/algebra-plugins/build/_deps/vecmem-src/cuda/src/memory/managed_memory_resource.cpp:45 Failed to execute: cudaFree(p) (an illegal memory access was encountered)
Aborted (core dumped)

The Release build is fine

beomki-yeo commented 4 months ago

What is the gcc & cuda version?

niermann999 commented 4 months ago

gcc/13.2 cuda/12.4

krasznaa commented 4 months ago

Curious. With GCC 11.4 + CUDA 12.4 it does work happily on my laptop. :thinking: Will try with GCC 13 in a little bit...

krasznaa commented 4 months ago

Never mind. Once I actually do the build in debug mode, I do get the same. With both GCC 11.4 and 13.1.

krasznaa commented 4 months ago

What I see is:

[ RUN      ] algebra_plugins/test_cuda_basics/cuda_eigen_eigen<float>.transform3

CUDA Exception: Warp Illegal Instruction
The exception was triggered at PC 0x0 (Transform.h:1405)

Thread 1 "algebra_test_ei" received signal CUDA_EXCEPTION_4, Warp Illegal Instruction.
[Switching focus to CUDA kernel 0, grid 15, block (0,0,0), thread (128,0,0), device 0, sm 0, warp 6, lane 0]
0x0000000000000010 in Eigen::internal::check_static_allocation_size<double, 9> ()
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/out/build/default-x86-64/_deps/eigen3-src/Eigen/src/Geometry/Transform.h:1405
1405      static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other)
(cuda-gdb) bt
#0  0x0000000000000010 in Eigen::internal::check_static_allocation_size<double, 9> ()
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/out/build/default-x86-64/_deps/eigen3-src/Eigen/src/Geometry/Transform.h:1405
#1  0x00007fffa7a1c950 in Eigen::Transform<float, 3, 2, 0>::Transform<Eigen::CwiseNullaryOp<Eigen::internal::scalar_identity_op<float>, Eigen::Matrix<float, 4, 4, 0, 4, 4> > > (this=0x7fffe3fff850, other=...)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/out/build/default-x86-64/_deps/eigen3-src/Eigen/src/Geometry/Transform.h:292
#2  0x00007fffa7a1bbb0 in Eigen::Transform<float, 3, 2, 0>::Identity ()
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/out/build/default-x86-64/_deps/eigen3-src/Eigen/src/Geometry/Transform.h:535
#3  0x00007fffa7a2ca90 in algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >::transform3 (
    this=0x7fffe3fff850, t=..., x=..., y=..., z=..., get_inverse=true)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/math/eigen/include/algebra/math/impl/eigen_transform3.hpp:80
#4  0x00007fffa7a2bab0 in algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >::transform3 (
    this=0x7fffe3fff740, t=..., z=..., x=..., get_inverse=255)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/math/eigen/include/algebra/math/impl/eigen_transform3.hpp:118
#5  0x00007fffa77d6760 in test_device_basics<test_types<float, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >, int, algebra::eigen::matrix_type, algebra::eigen::matrix::actor<float> > >::transform3_ops (
    this=0x7fffe3fffd40, t1=0x7fff00000000, t2=0x7fffe3fffad0, t3=0x7fffe3fffae8, 
    a=0x7fffa77d6760 <test_device_basics<test_types<float, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >, int, algebra::eigen::matrix_type, algebra::eigen::matrix::actor<float> > >::transform3_ops(algebra::eigen::array<float, 3>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 3>) const+1632>, b=0x7fffe3fffb18)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/tests/common/test_device_basics.hpp:207
#6  0x00007fffa77d5530 in transform3_ops_functor<test_types<float, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >, int, algebra::eigen::matrix_type, algebra::eigen::matrix::actor<float> > >::operator() (
    this=0x7fffe3fffa40, i=140735743645696, t1=..., t2=..., t3=..., a=..., b=..., output=...)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/tests/accelerator/common/test_basics_functors.hpp:129
#7  0x00007fffa77d3070 in (anonymous namespace)::cudaTestKernel<transform3_ops_functor<test_types<float, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::array<float, 2>, algebra::eigen::array<float, 3>, algebra::eigen::math::transform3<float, algebra::eigen::matrix::actor<float> >, int, algebra::eigen::matrix_type, algebra::eigen::matrix::actor<float>--Type <RET> for more, q to quit, c to continue without paging--c
 > >, vecmem::data::vector_view<algebra::eigen::array<float, 3> >, vecmem::data::vector_view<algebra::eigen::array<float, 3> >, vecmem::data::vector_view<algebra::eigen::array<float, 3> >, vecmem::data::vector_view<algebra::eigen::array<float, 3> >, vecmem::data::vector_view<algebra::eigen::array<float, 3> >, vecmem::data::vector_view<float> ><<<(20,1,1),(256,1,1)>>> (
    arraySizes=5000, args=..., args=..., args=..., args=..., args=..., args=...)
    at /home/krasznaa/ATLAS/projects/algebra/algebra-plugins/tests/accelerator/cuda/common/execute_cuda_test.cuh:28
(cuda-gdb)

In case somebody manages to debug it before me. :wink:

stephenswat commented 4 months ago

The fact that the PC is 0x0 is rather worrying. :sweat_smile:

krasznaa commented 4 months ago

As the backtrace says, the crash is triggered by this line:

https://github.com/acts-project/algebra-plugins/blob/main/math/eigen/include/algebra/math/impl/eigen_transform3.hpp#L80

At which point it's hard to argue that this wouldn't be coming from some internal Eigen issue. :thinking: Having quickly looked at the code, I don't really understand what the issue is. Why the final call itself, would cause an error.

Unfortunately I won't be able to debug this any further at the moment. So somebody could possibly look into using a newer/different version of Eigen, and see what happens with that. Otherwise, maybe we just don't use Eigen on GPUs in Debug mode for now... :thinking:

stephenswat commented 4 months ago

image

Sus.