Closed marcnunezc closed 3 years ago
This happens because the os.mkdir issues the instruction to make a dir, but is the fs who is in charge to make it (which can be slower than the instruction execution).
I agree with @philbucher, the only solution is to put a code that waits for the folder to be created. Ej:
if rank == 0:
os.makedir(file_path)
while not os.path.exists(file_path):
time.sleep(1)
Would be great to have a robust solution! For now I typically create the more complex folder structure while testing locally (typically with OpenMP) and reuse it in MPI. But this is not in anyway something that should be a long term solution.
Description @KratosMultiphysics/mpi
In https://github.com/KratosMultiphysics/Kratos/pull/8835#discussion_r644181533, the following issue was pointed out by @sunethwarna when creating a folder in parallel, especially in clusters:
Is this something we can prevent from our side?
@philbucher also proposed replacing the barrier with some wait time so that the filesystem is correctly updated in all ranks: