KratosMultiphysics / Kratos

Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is written in C++ with extensive Python interface.
https://kratosmultiphysics.github.io/Kratos/
Other
1.04k stars 245 forks source link

[MPI] What is the safest way to create folders? #8863

Closed marcnunezc closed 3 years ago

marcnunezc commented 3 years ago

Description @KratosMultiphysics/mpi

In https://github.com/KratosMultiphysics/Kratos/pull/8835#discussion_r644181533, the following issue was pointed out by @sunethwarna when creating a folder in parallel, especially in clusters:

if rank==0:
   os.makedirs(folder_name)
data_comm.Barrier()

Not related to this PR, FYI :)

We already have problems with this barrier (@adityaghantasala can also confirm). Even in VTK this barrier is not working as it is suppose to. The problem is as following.

For some reason in the few clusters we checked, even if the rank 0 creates the folder successfully, the folder structure is not updated in some of the computing nodes, hence after the barrier the process complains that the folder is not found when the process uses that created folder in some nodes. The same exact block is present in vtk, which is also sometimes does not help at all. This is not related to this code segment, but it might fail in some clusters. I have experienced this failing a lot. Only way is to run the process ones, and once it fails run it again so in the second time the folder structure is there.

https://github.com/KratosMultiphysics/Kratos/blob/8e95a20f9984988c33a98e77074f74edf1ce6d45/kratos/input_output/vtk_output.cpp#L234-L241

So I would not expect this barrier to solve the problem in MPI (This is evident a lot in large MPI cases)

Is this something we can prevent from our side?

@philbucher also proposed replacing the barrier with some wait time so that the filesystem is correctly updated in all ranks:

It is really only the folder creation at runtime that causes the issues, both in C++ and in Python. This problem has in fact been known for years and I was already in contact with the cluster support but we couldn't figure it out. From the programming POV the code above is (still) correct to me.

The only thing I haven't tried yet is to change the Barrier to an explicit wait for the folder to exist ie.e. sth lke this:

if !folder_exists && rank==0:
    create folder
WaitForFolderToExist() // instead of Barrier

My thinking is that there is some latency in the filesystem (this is common in cluster filesystems afaik) and hence the folder is not yet "visible" on all ranks

roigcarlo commented 3 years ago

This happens because the os.mkdir issues the instruction to make a dir, but is the fs who is in charge to make it (which can be slower than the instruction execution).

I agree with @philbucher, the only solution is to put a code that waits for the folder to be created. Ej:

if rank == 0:
    os.makedir(file_path)

while not os.path.exists(file_path):
    time.sleep(1)
mpentek commented 3 years ago

Would be great to have a robust solution! For now I typically create the more complex folder structure while testing locally (typically with OpenMP) and reuse it in MPI. But this is not in anyway something that should be a long term solution.