Closed Algiane closed 3 months ago
After investigation, on GPU:
the kernel for acoustic wave equations attempts to acces a negative position (position -2
) in the m_elemsToNodes
array when computing the stiffness term (AcousticWaveEquationSEMKernel:193
);
the negative position is due to a wrong evaluation of qb
to -1
by the multiIndex
method for the P1 lagrange basis functions inside the Qk_Hexahedron_Lagrange_GaussLobatto
method;
the error for the multiIndex computation is due to a bug inside the cuda compiler when using the bitwise shift operator and compiling cuda code with -G
option. This bug has been resolved in version 11.6 of cuda.
Enter the following code in https://godbolt.org/:
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
#include <cuda.h>
__host__ __device__ void cuda_hello(void){
int q = 2;
printf("%d\n", (q & 2) );
printf("%d\n", (q ) >> 1 );
printf("%d\n", (q & 2) >> 1 );
}
__global__ void hello(void){
cuda_hello();
}
int main(void) {
hello<<<1,1>>>();
cudaError_t err = cudaDeviceSynchronize();
if (err != cudaSuccess)
printf("kernel launch failed with error \"%s\".\n",
cudaGetErrorString(err));
return EXIT_SUCCESS;
}
The last printed value should be 1
while it is -1
if the code is built with a nvcc version strictly lower than 11.6 and the -G
command line argument.
Remark
Relying on the CUDA release note of version 12, there are issue with the -G
option that modifies the computation results up to version 12: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id2
Describe the bug The unit tests of the wave propagation kernel or order 1 in space fail when compiled in Debug mode on Pangea3 with GPUs.
It produces a first error message
This message is followed by other error messages due to invalid memory release in destructors:
To Reproduce Steps to reproduce the behavior:
TPLs
and thetestWavePropagation
unit test (for example) with GPU usingCuda-11.5
(for example on Pangea-3 with theTOTAL/pangea3-gcc8.4.1-openmpi-4.1.2.cmake
host config file and inDebug
mode:testWavePropagation
unit test:Platform (please complete the following information):