Open WardF opened 1 month ago
@edwardhartnett @jhendersonHDF if anything leaps out at you, feel free to chime in, it might save some time as I dig through this! And if not, no worries XD. Thanks!
Additional notes:
On ubuntu 24.04, installing libhdf5-mpi-dev
installs openmpi
and related tools. This version of libhdf5 works just fine, although the nc_test4/run_par_test.sh
script requires --oversubscribe
be passed to mpiexec -n 16 ./tst_parallel3
. Otherwise, there is a complaint if the machine has < 16 cores/processors/what-have-you.
Using mpich
and a custom-built libhdf5
, we cannot oversubscribe. However, this is not an issue, because invoking mpiexec -n 2 ./tst_parallel3
results in the same issue as if we passed 4, or 8, or 16. Running tst_parallel3
directly works, but of course it is bypassing MPI
entirely.
Installing libhdf5-mpich-dev
sees the same behavior as using the custom-built version of libhdf5
. This suggests there is an issue when using mpich
but not inherently MPI.
Update: For clarity, the tests pass when using mpich 4.0, gcc 11.4.0.
I'm observing a failure using
mpicc
and runningnc_test4/run_par_test.sh
.This issue occurs when running
mpicc
version13.x
, but does not occur on systems usingmpicc
version11.x
. This is most easily observed on my end using Ubuntu22.04
vs.24.04
. I've created a couple of docker images which can be used to observe this. They can be run as follows:and
You can enter the environment by appending
bash
to the end of either docker command.It seems that the issue is related to the different version of
mpicc
, but I'm trying to sort through what exactly is going on. Any suggestions would be appreciated.The error specifically is as follows: