Open vbaconnet opened 3 months ago
Hej! I had issues with segfaults in find_points in instances where I was using a lot of ranks for the amount of elements I had. Could that be an issue for you, i.e., what happens when you reduce the number of ranks?
As I understand, fgslib_findpts_setup creates a hash mesh of the domain to determine rank candidates, etc. This would have nothing to do with the number of probes, more on how the elements in the domain are distributed I think. (I might be wrong)
There are probably some knobs to turn inside gslib in such cases, but it is good to confirm if we have the same.
As the title says, I have encountered issues running with probes on Dardel (GPU and CPU) and LUMI-G.
Observed behaviour
Simulation freezes and segfaults at
fgslib_findpts_setup
inglobal_interpolator
.Only error message that is dumped is as follows:
The case is a simple box with constant inflow/outflow. I attach a zip folder for a test case to check reproducibility. case.zip. The case can be run with turboneko.
release/0.8
withexport ATP_ENABLED=true
(but from memory this was also happening with develop)rayleigh-benard-cylinder
in our examples.Config
On Dardel GPU
Modules:
Configuration:
./configure FC=ftn CC=cc MPIFC=ftn MPICC=cc HIPCC=hipcc --with-hip HIP_HIPCC_FLAGS=-O3 --offload-arch=gfx90a --enable-device-mpi --with-gslib=$GSLIB --host=x86_64-pc-linux-gnu
LUMI-G
Edit: Somehow I cannot reproduce it on LUMI anymore, or rather no segfault but it still freezes for a long time at
fgslib_findpts_setup
.Configuration:
./configure --with-gslib=$GSLIB FC=ftn CC=cc HIPCC=hipcc MPIFC=ftn MPICC=cc --with-hip HIP_HIPCC_FLAGS=-O3 -x hip --offload-arch=gfx90a --enable-device-mpi