GEOS-DEV / GEOS

GEOS Simulation Framework
GNU Lesser General Public License v2.1
211 stars 85 forks source link

Issue of generating hdf5 files on lustre filesystems #1482

Closed jhuang2601 closed 3 years ago

jhuang2601 commented 3 years ago

I'm trying to run the tutorial case for Sneddon problem (Tutorial 10: Pressurized fracture in an infinite medium) on Pangea2; however, following error message is received:

This requires fcntl(2) to be implemented. As of 8/25/2011 it is not. Generic MPICH Message: File locking failed in ADIOI_Set_lock(fd 1C,cmd F_SETLKW/7,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno 26.
- If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching).
- If the file system is LUSTRE, ensure that the directory is mounted with the 'flock' option.
ADIOI_Set_lock:: Function not implemented
ADIOI_Set_lock:offset 4208, length 5760

By commenting out all the vtk outputs, the same case can be completed with silo outputs. In addition, on Pangea2, I'm testing the HI24L case with vtk outputs. It is running smoothly and no issue is encountered.

I have ran multiple cases with vtk outputs on Pangea2. This is the first time that I saw this issue, which might be related to vtk settings, system configuration or other?

TotoGaz commented 3 years ago

@jhuang2601 Which version are you running? Are you running in parallel? How many cores? Can you run is on another filesystem (not lustre if you can: maybe run it directly on a login node if the case is light enough)?

TotoGaz commented 3 years ago

@CusiniM You did implement the embedded fracture, didn't you? You got it running with vtk in parallel?

jhuang2601 commented 3 years ago

@jhuang2601 Which version are you running? Are you running in parallel? How many cores? Can you run is on another filesystem (not lustre if you can: maybe run it directly on a login node if the case is light enough)?

@TotoGaz I'm running it with this commit (b1c04cd). And only single core is used for this small scale problem. I can run it locally on the login node (pangraph14) and same problem.

TotoGaz commented 3 years ago

@jhuang2601 can you try the output folder on something else than /work, /scratch... for example /tmp/jh?

jhuang2601 commented 3 years ago

@jhuang2601 Which version are you running? Are you running in parallel? How many cores? Can you run is on another filesystem (not lustre if you can: maybe run it directly on a login node if the case is light enough)?

@TotoGaz I'm running it with this commit (b1c04cd). And only single core is used for this small scale problem. I can run it locally on the login node (pangraph14) and same problem.

And only one vtk file is printed image

TotoGaz commented 3 years ago

This is not a vtk file šŸ˜‰

jhuang2601 commented 3 years ago

@jhuang2601 can you try the output folder on something else than /work, /scratch... for example /tmp/jh?

@TotoGaz Bravo! I tried to run the case on my desktop folder and run it locally on the login node (pangraph14). Although it shows an warning message, the case is completed!

[pangraph14:00835] mca_base_component_repository_open: unable to open mca_fs_lustre: liblustreapi.so.1: cannot open shared object file: No such file or directory (ignored)

TotoGaz commented 3 years ago

Good news. This means that surely the If the file system is LUSTRE, ensure that the directory is mounted with the 'flock' option. is to be considered with care (this indicates that you can "lock a file away from other processes"). I know for 99% sure that indeed, our lustre fs do not allow flock.

I'm surprised that the issue does not arise in other cases... You confirm standard cases like Laplace are running OK on lustre filesystems (like /work or /scratch)?

PS: to know the filesystem type, run df -hT:

# df -hT
Filesystem     Type      Size  Used Avail Use% Mounted on
overlay        overlay   427G  121G  284G  30% /
tmpfs          tmpfs      64M     0   64M   0% /dev
tmpfs          tmpfs      16G     0   16G   0% /sys/fs/cgroup
shm            tmpfs      64M     0   64M   0% /dev/shm
/dev/nvme0n1p1 ext4      427G  121G  284G  30% /etc/hosts
tmpfs          tmpfs      16G   12K   16G   1% /proc/driver/nvidia
tmpfs          tmpfs     3.2G  9.2M  3.2G   1% /run/nvidia-persistenced/socket
udev           devtmpfs   16G     0   16G   0% /dev/nvidia0
tmpfs          tmpfs      16G     0   16G   0% /proc/asound
tmpfs          tmpfs      16G     0   16G   0% /proc/acpi
tmpfs          tmpfs      16G     0   16G   0% /sys/firmware

or for a specific folder

# df -hT /tmp
Filesystem     Type     Size  Used Avail Use% Mounted on
overlay        overlay  427G  121G  284G  30% /
jhuang2601 commented 3 years ago

@TotoGaz As for most of the cases that I've ran on Pangea2, I usually print out the results in silo format and rarely use the vtk files and no such issue has ever been observed. From this test, we notice that there might be a potential issue on generating vtk files on lustre filesystems.

TotoGaz commented 3 years ago

OK, that's weird nevertheless since I've been running some cases with vtk and could not experience anything like described on lustre file systems... I'll investigate.

jhuang2601 commented 3 years ago

@TotoGaz I think the issue is related to the following two output tasks, which could generate two hdf5 files:

<TimeHistory
      name="timeHistoryOutput"
      sources="{/Tasks/displacementJumpCollection}"
      filename="displacementJump_history" />

    <TimeHistory
      name="cellCentersOutput"
      sources="{/Tasks/cellCentersCollection}"
      filename="cell_centers" />

image

By commenting out these output tasks (without creating hdf5 files), the case can be running on the lustre file system (/scratch) with vtk outputs.

TotoGaz commented 3 years ago

Great investigation @jhuang2601 ! That will help me!

TotoGaz commented 3 years ago

@jhuang2601 Do you indeed run with environment variable HDF5_USE_FILE_LOCKING=FALSE? (You do not have to actually submit on Pangea, just run a case on lustre with or without the environment variable.)

jhuang2601 commented 3 years ago

@TotoGaz Just make a test as you suggested and neither one (with or without the environment variable) works on lustre.

TotoGaz commented 3 years ago

Hi @jhuang2601 I've been using a trick that I do not fully understand yet, but since I'll be off for ~2 weeks, I provide it to you "as is". So please be cautious for the first run. If you could be helped by somebody, that would be nice.

The idea is to define a file that contains some IO hints for MPI, and then provide this information through the env variable ROMIO_HINTS

-bash-4.2$ cat ROMIO_HINTS
romio_ds_write disable 
romio_ds_read disable

-bash-4.2$ readlink -f ./ROMIO_HINTS 
/work206/workrd/SCR/GEOSX/somewhere/ROMIO_HINTS

-bash-4.2$ export ROMIO_HINTS=/work206/workrd/SCR/GEOSX/somewhere/ROMIO_HINTS

-bash-4.2$ /workrd/SCR/GEOSX/install/gcc8/GEOSX-b1c04cd/bin/geosx -i ./GEOSX/src/coreComponents/physicsSolvers/solidMechanics/benchmarks/Sneddon-Validation.xml -o /workrd/SCR/GEOSX/somewhere/output 
jhuang2601 commented 3 years ago

@TotoGaz Great, with this solution, the case can now be running on lustre filesystem. However, there is still some problem on postprocessing/visualizing the outputs on lustre filesystem:

Traceback (most recent call last):
  File "Results.py", line 124, in <module>
    main()
  File "Results.py", line 71, in main
    hf = h5py.File(hdf5File1Path, 'r')
  File "/data_local/sw/ANACONDA/anaconda3-8.5/lib/python3.8/site-packages/h5py/_hl/files.py", line 406, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/data_local/sw/ANACONDA/anaconda3-8.5/lib/python3.8/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (file locking disabled on this file system (use HDF5_USE_FILE_LOCKING environment variable to override), errno = 38, error message = 'Function not implemented')

As of now, I can copy the folder to another file system (not lustre) for the postprocessing part.

TotoGaz commented 3 years ago
OSError: Unable to open file (file locking disabled on this file system (use HDF5_USE_FILE_LOCKING environment variable to override), errno = 38, error message = 'Function not implemented')

Did you try with export HDF5_USE_FILE_LOCKING=false? Or with the same ROMIO_HINTS trick?

jhuang2601 commented 3 years ago

@TotoGaz You're right, just try with export HDF5_USE_FILE_LOCKING=false and it works for postprocessing/reading the hdf5 files. Previously, I used the same ROMIO_HINTS trick and it failed and hence my last comment.

jhuang2601 commented 3 years ago

@TotoGaz I do observe some side effect of the ROMIO_HINTS trick. Here is the testing case: https://github.com/GEOSX/MAELSTROM/tree/master/usecases/Jian/LagrangianContact/Single_Frac_Compression

If using the ROMIO_HINTS trick in a lustre file system with multiple ranks, simulation results seem weird: image

If running the same case in another file system (not lustre) without ROMIO_HINTS, here is the correct result: image

TotoGaz commented 3 years ago

Hmm, that's weird. It's false in a good looking way šŸ˜‰ Did you try to run with ROMIO_HINTS and a standard file system?

jhuang2601 commented 3 years ago

@TotoGaz If using single rank on a login node (lustre filesystem), this side effect is gone and ROMIO_HINTS trick still works. I guess ROMIO_HINTS somehow damages MPI runs in lustre file system.

TotoGaz commented 3 years ago

Hi, I've tested your case on my computer with develop. No ROMIO_HINTS, no lustre. I only had to remove keyword minSetSize="492".

I have these results with two ranks (-x 2).

Verification-2

And the results are perfectly fine on 1 rank. So I'm not sure that the issue is about ROMIO_HINTS or lustre there. Maybe somewhere else?

jhuang2601 commented 3 years ago

Hi, I've tested your case on my computer with develop. No ROMIO_HINTS, no lustre. I only had to remove keyword minSetSize="492".

I have these results with two ranks (-x 2).

Verification-2

And the results are perfectly fine on 1 rank. So I'm not sure that the issue is about ROMIO_HINTS or lustre there. Maybe somewhere else?

@TotoGaz I've tested different MPI runs and it seems that the running results might not be correctly output as hdf5 files or solver issue.

TotoGaz commented 3 years ago

Is this an hdf5 problem or a parallel problem? Do you have any other output format that you could use?

jhuang2601 commented 3 years ago

@TotoGaz By plotting silo output of shear displacement, same issue is observed, which suggests that it is a parallel problem. image

TotoGaz commented 3 years ago

It looks like. Should we close this ticket to open another one and involve others? Could you test vtk for example, to get further information?

jhuang2601 commented 3 years ago

@TotoGaz Using vtk output, same problem is seen. A discussion is initiated in the slack channel (faults_and_fractures). image

TotoGaz commented 3 years ago

Great additional info @jhuang2601!