JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
383 stars 139 forks source link

Can't perform independent write when MPI_File_sync is required by ROMIO driver. #1093

Open nahaharo opened 1 year ago

nahaharo commented 1 year ago

Hello. Recently I've been doing mpi job for extensive data processing. However, I'm getting the "Can't perform independent write when MPI_File_sync is required by ROMIO driver." error from the log.

The symptom is like the following:

  1. works well in the local (master) machine.
  2. when runs on nodes, it gives the error "Can't perform independent write when MPI_File_sync is required by ROMIO driver.".
  3. with dxpl_mpio=:collective, it stucks at write.

The local machine is connected directly to a disk, which I will write the hdf5 file, and the remote device was connected to a disk with NFS.

My question is, why does that error appear? Does it appear because it uses NFS? If this is avoidable, then how?

And also, In the article "https://www.hdfgroup.org/2015/08/parallel-io-with-hdf5/" there is H5Sselect_none operation for collective mode. Does HDF5.jl have similar functionality? If so, how can I use it?

Thanks.

Here is my test code.

using HDF5
using MPI

function main()
    @assert HDF5.has_parallel()

    MPI.Init()

    comm = MPI.COMM_WORLD
    info = MPI.Info()
    ff = h5open("test.h5", "w", comm, info)
    MPI.Barrier(comm)

    Nproc = MPI.Comm_size(comm)
    myrank = MPI.Comm_rank(comm)
    M = 10
    A = fill(myrank, M, 2)  # local data
    dims = (M, Nproc*2+1)    # dimensions of global data

    # Create dataset
    @show "Create dataset"
    dset = create_dataset(ff, "/data", datatype(eltype(A)), dataspace(dims), chunk=(M, 2), dxpl_mpio=:collective)
    @show "After dataset"

    # Write local data
    dset[:, 2*myrank + 1:2*myrank + 2] = A
    @show "After write dataset"

    close(ff)

    MPI.Finalize()
end

main()

And my result of "MPIPreferences.use_system_binary()".

julia> MPIPreferences.use_system_binary()
┌ Info: MPI implementation identified
│   libmpi = "libmpi"
│   version_string = "MPICH Version:      4.1.2\nMPICH Release date: Wed Jun  7 15:22:45 CDT 2023\nMPICH ABI:          15:1:3\nMPICH Device:       ch4:ofi\nMPICH configure:    --prefix=/home/---/tools/mpich --with-ucx=/home/---/tools/ucx\nMPICH CC:           /home/---/tools/gcc/bin/gcc    -O2\nMPICH CXX:          /home/hyunwook/tools/gcc/bin/g++   -O2\nMPICH F77:          /home/---/tools/gcc/bin/gfortran   -O2\nMPICH FC:           /home/---/tools/gcc/bin/gfortran   -O2\n"
│   imply = "MPICH"
│   version = v"4.1.2"
└   abi = "MPICH"
┌ Info: MPIPreferences unchanged
│   binary = "system"
│   libmpi = "libmpi"
│   abi = "MPICH"
│   pieces = "mpiexec"
│   preloads = Any[]
└   preloads_env_switch = nothing

Run script (for sbatch)

#!/bin/bash
#SBATCH -J hdf5_test
#SBATCH -o stdout_log.o%j
#SBATCH -N 1
#SBATCH -n 32

mpiexec.hydra -np $SLURM_NTASKS julia test.jl

My Env

mkitti commented 1 year ago

@simonbyrne might be best equipped to answer the overall question.

there is H5Sselect_none operation for collective mode. Does HDF5.jl have similar functionality? If so, how can I use it?

We don't have a pregenerated binding for H5Sselect_none in HDF5.jl for this yet. Based on auto-generated bindings in LibHDF5.jl you could just invoke the ccall directly.

https://github.com/mkitti/LibHDF5.jl/blob/712b6e306a15de37f748727b37676aca70ea0664/src/LibHDF5.jl#L3816-L3818

julia> import HDF5.API.HDF5_jll: libhdf5

julia> import HDF5.API: herr_t, hid_t

julia> function H5Sselect_none(spaceid)
           ccall((:H5Sselect_none, libhdf5), herr_t, (hid_t,), spaceid)
       end
H5Sselect_none (generic function with 1 method)

julia> dspace = dataspace((1,1))
HDF5.Dataspace: (1, 1)

julia> H5Sselect_none(dspace)
0

julia> dspace
HDF5.Dataspace: (1, 1) [irregular selection]
simonbyrne commented 1 year ago

It could be that you are still using the HDF5 library linked against the bundled MPI library (i.e. not the system one).

You either need to specify it (currently you need to set JULIA_HDF5_PATH), or use MPItrampoline (which requires building a wrapper around your system MPI library)

simonbyrne commented 1 year ago

If that is not the case, does it work without the chunk option?

nahaharo commented 1 year ago
  1. System MPI library(MPICH, that was bulit from source) was used.
  2. JULIA_HDF5_PATH was set properly
  3. with or without chunk, in independent io mode, it still gives the same error

I think this error occurs because of NFS (based on this issue: https://forum.hdfgroup.org/t/hang-for-mpi-hdf5-in-parallel-on-an-nfs-system/6541/3) It looks like now collective mode is working. So I'm going for it.