Open marchdf opened 2 hours ago
@djglaze does this match the pattern for the cuda/compiler-bug that we hit a few months ago?
Marc, we hit a bug where a function with lambda like this, would seg-fault if included in multiple compilation units, but run fine if only included by one .C file... Our solution was to use a functor instead of a lambda. A functor is a class object with an operator() method.
Interesting... FWIW I get the same on 2 different GPUs/cuda version: H100 with cuda@12.4.1 and A100 with cuda@12.5.1
Do you have an example of the functor conversion that you made that fixed the issue?
I think you would just do something like this: struct MyFunctor { KOKKOS_FUNCTION void operator()(const MeshIndex& mi) { //the code above that loops over numComponents and sets yField } //data-members xField, yField, alpha, beta, numComponents }; MyFunctor f; f.xField = ...; //etc nalu_ngp::run_entity_algorithm(..., f);
It's ugly but might be a worthy experiment...
Works fine with Debug build but this is what I get with a RelWithDebInfo build:
Valgrind output:
LLDB output:
In
NgpFieldBLAS.h
, this makes it so the segfault go away:This makes the segfault come back:
So just calling
run_entity_algorithm
causes the segfault. Issel
bad? Does anyone have ideas for the next steps? Tagging @alanw0 and @psakievich.