Closed scivision closed 4 months ago
Thanks for the report, pinging @tim-griesbach to check if there may be any unintialized value problem in loading.
Thanks for the report, pinging @tim-griesbach to check if there may be any unintialized value problem in loading.
Since I do not have access to a machine running with Ubuntu (24.04) I can not reproduce the error locally. Nonetheless, I checked test_loadsave2 locally using valgrind but the program is valgrind clean on my machine.
Hence, I tried to investigate the problem using the CI and I found out that the md5sum of p4est.p4p
stays not the same for re-runs in the CI for Ubuntu 24.04 with libmpich-dev. Moreover, I printed the results of MPI_File_get_position
in save_ext
and these also differ between re-runs. However, for some runs the positions are correct but then the program crashes due to an other problem (segfault).
Given my current observations the issue seems to be caused by a strange MPI behavior but I am not sure about the cause of MPICH's behavior.
Due to the issue described in https://github.com/cburstedde/libsc/issues/191, I can not use valgrind in the CI in combination with Ubuntu 24.04. @scivision Since you were able to reproduce the CI error locally, can you run test_loadsave2 with valgrind?
What happens if we remove the gcc version numbers from the CI and use whatever is the default for ubuntu-22/24/latest?
It seems to be fine again. In fact, test_loadsave is the one test that trips MPI I/O issues reliably. I have had this fail transiently quite a lot of times in the past for many years.
Still pinging @tim-griesbach for double-checking that this does not have to do with the recent merge on saving a p4est in a more standard conforming way wrt. libc I/O.
Still pinging @tim-griesbach for double-checking that this does not have to do with the recent merge on saving a p4est in a more standard conforming way wrt. libc I/O
Yes, I double-checked the recent changes in saving a p4est. The changes do not change the md5sum of the created file and I also compared the file positions used for writing and reading and they also do not change with the more standard conforming code. Therefore, the two issues causing the failing test (cf. my report above) are not caused by the recent code changes.
Closing as not-a-bug.
The "test_loadsave2" test passes on Ubuntu 24.04 with OpenMPI, and on other OS (Ubuntu 22.04, macOS, etc) regardless of OpenMPI or MPICH.
However, discovered in #303 and confirmed on a laptop with Ubuntu 24.04 is that "test_loadsave2" fails with MPICH and GCC-12, GCC-13, or GCC-14
Zlib and MPICH were enabled/used.
Wondering if this is just a flaky test or a code update is needed?