ICLDisco / dplasma

DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Other
11 stars 9 forks source link

testing_dgeqrf fails in CUDA runs #70

Closed therault closed 8 months ago

therault commented 1 year ago

Describe the bug

It seems DGEQRF (PTG at least) is broken for CUDA runs

To Reproduce

Steps to reproduce the behavior:

  1. Checkout current master
  2. Compile with CUDA enabled (e.g. use modules hwloc cuda gcc openmpi gdb ninja cmake intel-mkl python on leconte) and let cmake detect everything
  3. Run ./tests/testing_dgeqrf -N 4096 -t 1024 -x -g 1
  4. See error

Expected behavior

The CUDA driver complains of misaligned memory accesses and bails out

~/dplasma/out/build/Debug $ ./tests/testing_dgeqrf -N 4096 -t 1024 -x -g 1
W@00000 /!\ PERFORMANCE MIGHT BE REDUCED /!\: The binding defined by --parsec_bind has been ignored!
    This option requires a build with HWLOC with bitmap support.
#+++++ cores detected       : 80
#+++++ nodes x cores + gpu  : 1 x 80 + 1 (80+1)
#+++++ thread mode          : THREAD_SERIALIZED
#+++++ P x Q                : 1 x 1 (1/1)
#+++++ M x N x K|NRHS       : 4096 x 4096 x 1
#+++++ MB x NB , IB         : 1024 x 1024 , 32
#+++++ KP x KQ              : 4 x 1
W@00000 /home/herault/dplasma/parsec/parsec/mca/device/cuda/device_cuda_module.c:2012 (progress_stream) cudaEventQuery an illegal memory access was encountered
W@00000 Critical issue related to the GPU discovered. Giving up
abouteiller commented 8 months ago

more information in #110