Closed BenWibking closed 2 years ago
For completeness, this is the stdout:
$ cat async_vol_test.out
Compute/sleep for 1 seconds...
Create file [./test_0.h5]
Write dset 0
Write dset 1
Observed write dset time: 0.000074
Observed write attr time: 0.008439
Observed total write time: 0.009069
H5ESwait start
H5ESwait done
Compute/sleep for 1 seconds...
Create file [./test_1.h5]
Write dset 0
Write dset 1
Observed write dset time: 0.000071
Observed write attr time: 0.004971
Observed total write time: 0.005413
H5ESwait start
Hi @BenWibking , what version of HDF5 and OpenMPI are you using?
Seems like the problem is with OpenMPI and its MPI_THREAD_MULTIPLE support, based on this post, can you try "ompi_info" on your cluster and look for whether MPI_THREAD_MULTIPLE is supported.
MPI_THREAD_MULTIPLE is supported in the OpenMPI 4.1.3 build I'm using:
$ ompi_info | grep MPI_THREAD_MULTIPLE
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes)
I'm using commit a80897ee4944ff6008bfb3b93619ebcb58a070d1 from the HDF5 repo.
Can you try manually running the test code with and without mpirun: "mpirun -np 1 ./async_test_multifile.exe" and " ./async_test_multifile.exe" to see if the error occurs?
Edit: I did not set the environment variables correctly for this run. This is using the wrong HDF5 install.
Both of those work:
[bw0729@gadi-login-03 test]$ mpirun -np 1 ./async_test_multifile.exe
Compute/sleep for 1 seconds...
Create file [./test_0.h5]
Write dset 0
Write dset 1
Observed write dset time: 0.006957
Observed write attr time: 0.008377
Observed total write time: 0.044399
H5ESwait start
H5ESwait done
Compute/sleep for 1 seconds...
Create file [./test_1.h5]
Write dset 0
Write dset 1
Observed write dset time: 0.006853
Observed write attr time: 0.004882
Observed total write time: 0.015456
H5ESwait start
H5ESwait done
Compute/sleep for 1 seconds...
Create file [./test_2.h5]
Write dset 0
Write dset 1
Observed write dset time: 0.006733
Observed write attr time: 0.004852
Observed total write time: 0.014861
H5ESwait start
H5ESwait done
Total execution time: 3.135021
Finalize time: 0.000000
[bw0729@gadi-login-03 test]$ ./async_test_multifile.exe
Compute/sleep for 1 seconds...
Create file [./test_0.h5]
Write dset 0
Write dset 1
Observed write dset time: 0.006415
Observed write attr time: 0.008508
Observed total write time: 0.022484
H5ESwait start
H5ESwait done
Compute/sleep for 1 seconds...
Create file [./test_1.h5]
Write dset 0
Write dset 1
Observed write dset time: 0.007626
Observed write attr time: 0.004874
Observed total write time: 0.018154
H5ESwait start
H5ESwait done
Compute/sleep for 1 seconds...
Create file [./test_2.h5]
Write dset 0
Write dset 1
Observed write dset time: 0.006863
Observed write attr time: 0.004849
Observed total write time: 0.017120
H5ESwait start
H5ESwait done
Total execution time: 3.109642
Finalize time: 0.000000
Very strangely, the pytest runner for that test also now works (I did not recompile anything):
[bw0729@gadi-login-03 test]$ make check_serial
python3 ./pytest.py
Running serial tests
Test # 1 : async_test_serial.exe PASSED
Test # 2 : async_test_serial2.exe PASSED
Test # 3 : async_test_multifile.exe PASSED
Test # 4 : async_test_serial_event_set.exe PASSED
ERROR: Test async_test_serial_event_set_error_stack.exe : returned non-zero exit status= 255 aborting test
run_cmd= ./async_test_serial_event_set_error_stack.exe
pytest was unsuccessful
Can you try running other tests manually as well, make check* uses a python script to run the tests and it could be python environment causing the error. Alternatively, you can build vol-async with cmake and run the tests with "ctest".
Hi,
I am running on an x86-64 Linux OpenMPI cluster, and I have built following the instructions in the README, but the tests do not complete successfully:
The backtrace is: