cwpearson / stencil

A prototype MPI/CUDA stencil communication library
Boost Software License 1.0
10 stars 3 forks source link

USE_CUDA_GRAPH causes test_cuda failures on cuda/10.1.243 #26

Closed cwpearson closed 4 years ago

cwpearson commented 4 years ago

The (current) default of module load cuda is module load cuda/10.1.243. This causes failures when test/test_cuda "pack*" is called, sometimes 700: illegal memory access and sometimes 715: illegal instruction.

Using module load cuda/10.2.89 seems to correct the problem. After switching CUDA modules, re-run cmake to pick up the new CUDA compiler.

cwpearson commented 4 years ago

This was a bug in the kernel implementation. Seems that CUDA was throwing strange or incorrect errors, which was partially improved by debuggin under CUDA 10.2. This has not been seen since the bug was resolved.