Error raised by the `checkIndices` function for the `testWavePropagation.*` unit tests on GPUs and in Debug mode

Describe the bug The unit tests of the wave propagation kernel or order 1 in space fail when compiled in Debug mode on Pangea3 with GPUs.

It produces a first error message

Moved    32.0 B to the DEVICE: LvArray::Array<float, 1, camp::int_seq<long, 0l>, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh/meshLevels/Level0/nodeManager/stiffnessVector
/appli_RD/FROEHLY/GEOS/src/coreComponents/LvArray/src/indexing.hpp:209: void LvArray::indexing::checkIndices(const INDEX_TYPE *, INDICES...) [with INDEX_TYPE = int; INDICES = int; INDICES = int]: block: [0,0,0], thread: [0,0,0] Assertion `false && "EXP = " STRINGIZE( invalidIndices( dims, indices ... ) ) "MSG = " STRINGIZE( "Invalid indices. " << printDimsAndIndices( dims, indices ... ) )` failed.
unknown file: Failure
C++ exception with description "campCudaErrchk(cudaStreamSynchronize(stream)) device-side assert triggered /workrd/users/l1116294/TPLS-GPU/install-pangea3-gcc8.4.1-openmpi-4.1.2-debug/chai/include/camp/resource/cuda.hpp:172" thrown in the test body.

This message is followed by other error messages due to invalid memory release in destructors:

terminate called after throwing an instance of 'umpire::runtime_error'
  what():  ! Umpire runtime_error [/work206/workrd/users/l1116294/TPLS-GPU/build-pangea3-gcc8.4.1-openmpi-4.1.2-debug/chai/src/chai/src/tpl/umpire/src/umpire/alloc/CudaMallocAllocator.hpp:62]: cudaFree( ptr = 0x20025a6a2c00 ) failed with error: device-side assert triggered

To Reproduce Steps to reproduce the behavior:

Build the TPLs and the testWavePropagation unit test (for example) with GPU using Cuda-11.5 (for example on Pangea-3 with the TOTAL/pangea3-gcc8.4.1-openmpi-4.1.2.cmake host config file and in Debug mode:

python3 $GEOS_DIR/scripts/config-build.py -bt Debug -hc $GEOS_DIR/host-configs/TOTAL/pangea3-gcc8.4.1-openmpi-4.1.2.cmake
make -C build-pangea3-gcc8.4.1-openmpi-4.1.2-debug testWavePropagation

Run the testWavePropagation unit test:

OMP_NUM_THREADS=1 build-pangea3-gcc8.4.1-openmpi-4.1.2-debug/tests/testWavePropagation "testWavePropagation"

See error

Moved    32.0 B to the DEVICE: LvArray::Array<float, 1, camp::int_seq<long, 0l>, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh/meshLevels/Level0/nodeManager/stiffnessVector
/appli_RD/FROEHLY/GEOS/src/coreComponents/LvArray/src/indexing.hpp:209: void LvArray::indexing::checkIndices(const INDEX_TYPE *, INDICES...) [with INDEX_TYPE = int; INDICES = int; INDICES = int]: block: [0,0,0], thread: [0,0,0] Assertion `false && "EXP = " STRINGIZE( invalidIndices( dims, indices ... ) ) "MSG = " STRINGIZE( "Invalid indices. " << printDimsAndIndices( dims, indices ... ) )` failed.
unknown file: Failure
C++ exception with description "campCudaErrchk(cudaStreamSynchronize(stream)) device-side assert triggered /workrd/users/l1116294/TPLS-GPU/install-pangea3-gcc8.4.1-openmpi-4.1.2-debug/chai/include/camp/resource/cuda.hpp:172" thrown in the test body.

Platform (please complete the following information):

Compiler: Cuda 11.5

After investigation, on GPU:

the kernel for acoustic wave equations attempts to acces a negative position (position -2) in the m_elemsToNodes array when computing the stiffness term (AcousticWaveEquationSEMKernel:193);
the negative position is due to a wrong evaluation of qb to -1 by the multiIndex method for the P1 lagrange basis functions inside the Qk_Hexahedron_Lagrange_GaussLobatto method;
the error for the multiIndex computation is due to a bug inside the cuda compiler when using the bitwise shift operator and compiling cuda code with -G option. This bug has been resolved in version 11.6 of cuda.

Minimal reproducer

Enter the following code in https://godbolt.org/:

#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
#include <cuda.h>

__host__ __device__ void cuda_hello(void){

  int q = 2;

  printf("%d\n", (q & 2) );

  printf("%d\n", (q ) >> 1 );

  printf("%d\n", (q & 2) >> 1 );
}

__global__ void hello(void){
   cuda_hello();
}

int main(void) {

  hello<<<1,1>>>();

  cudaError_t err  = cudaDeviceSynchronize();

  if (err != cudaSuccess)
              printf("kernel launch failed with error \"%s\".\n",
                             cudaGetErrorString(err));

  return EXIT_SUCCESS;
}

The last printed value should be 1 while it is -1 if the code is built with a nvcc version strictly lower than 11.6 and the -G command line argument.

Remark Relying on the CUDA release note of version 12, there are issue with the -G option that modifies the computation results up to version 12: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id2

GEOS-DEV / GEOS

Error raised by the `checkIndices` function for the `testWavePropagation.*` unit tests on GPUs and in Debug mode #3212

Minimal reproducer