firelab / windninja

A diagnostic wind model developed for use in wildland fire modeling.
https://weather.firelab.org/windninja/
Other
117 stars 44 forks source link

OpenFoam error when using more than one thread in Docker #497

Open santiagoMonedero opened 1 year ago

santiagoMonedero commented 1 year ago

Hi, I am trying to run the example file cli_momentumSolver_diurnal.cfg using the Dockerfile in the respository (v3.9) but it fails when using more than 1 thread. All I have done is 1) clone repository, 2) build dockerfile and 3) run the docker interactively to access WindNinja_cli.

Surprisingly it gives different errors when creating and using the image in Ubuntu 20.04 through WSL2 in windows 11, and directly using Ubuntu 18.04 without WSL

  1. Ubuntu 18.04

    root@15a139c48048:/home/wind/example# WindNinja_cli cli_momentumSolver_diurnal.cfg
    Run 0: Reading elevation file...
    Run 0: Simulation time is 2011-Sep-23 13:30:00 MDT
    Run 0: Run number 0 started with 2 threads.
    Run 0: Writing OpenFOAM files...
    Run 0: Converting DEM to STL format...
    Run 0: Transforming surface points to output wind height...
    Run 0: Generating mesh...
    Run 0: Running blockMesh...
    Run 0: Decomposing domain for parallel mesh calculations...
    Run 0: Running moveDynamicMesh...
    Run 0: Reconstructing domain...
    Exception caught: Error during reconstructPar().
    Exception caught: Error during reconstructPar().
  2. Ubuntu 20.04 with WSL on windows 11

    root@463e5936b024:/home/wind/example# WindNinja_cli cli_momentumSolver_diurnal.cfg
    Run 0: Reading elevation file...
    Run 0: Simulation time is 2011-Sep-23 13:30:00 MDT
    Run 0: Run number 0 started with 3 threads.
    Run 0: Writing OpenFOAM files...
    Run 0: Converting DEM to STL format...
    Run 0: Transforming surface points to output wind height...
    Run 0: Generating mesh...
    Run 0: Running blockMesh...
    ERROR 1: posix_spawnp() failed
    Exception caught: Error during blockMesh().
    Exception caught: Error during blockMesh().

PS: I know this was a known issue in a previous windninja version but just wanted to give it a try on the new release and check if there is no work around it yet. Thanks!!

santiagoMonedero commented 1 year ago

Hi, after some digging into this I think I have a working solution. I have done it directly in the container but I guess you can add it to the Dockerfile or the source code.

  1. MPI seems to require root privilege to run. Apparently this can be done by any of the following methods: 1.A adding --allow-run-as-root into the mpirun command. Probably here: https://github.com/firelab/windninja/blob/2c971aeeaa66b7f2614d895eaac23f9cd537d974/src/ninja/ninjafoam.cpp#L1216 1.B exporting OMPI_ALLOW_RUN_AS_ROOT=1 and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 which is what I did. Before doing so however run the docker interactively and execute mpirunto get the following scary message which, to be honest, I have absolutely no clue how relevant it is or it is not being inside a Docker (but it is definitely very scary):
    
    root@1d33a1a7e569:/data# mpirun
    --------------------------------------------------------------------------
    mpirun has detected an attempt to run as root.

Running as root is strongly discouraged as any mistake (e.g., in defining TMPDIR) or bug can result in catastrophic damage to the OS file system, leaving your system in an unusable state.

We strongly suggest that you run mpirun as a non-root user.

You can override this protection by adding the --allow-run-as-root option to the cmd line or by setting two environment variables in the following way: the variable OMPI_ALLOW_RUN_AS_ROOT=1 to indicate the desire to override this protection, and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 to confirm the choice and add one more layer of certainty that you want to do so. We reiterate our advice against doing so - please proceed at your own risk.


2. Once you have root privileges you can run **OpenFOAM** with **MPI** but you will get a convergence problem when trying their test cases like   $`FOAM_TUTORIALS/incompressible/simpleFoam/motorBike`, or you will get stucked into the  `(moveDynamicMesh) 100% complete...`  in Windninja without ever getting into the domain reconstruction part. Apparently this is an MPI known issue (see https://github.com/open-mpi/ompi/issues/4948) which can be fixed by adding the environmental variable `export OMPI_MCA_btl_vader_single_copy_mechanism=none` 

And that's it !, we have the Docker container running with momentum conservation and 25 threads 

root@d0a02ed7692b:/home/wind/example# WindNinja_cli cli_momentumSolver_diurnal.cfg Run 0: Reading elevation file... Run 0: Simulation time is 2011-Sep-23 13:30:00 MDT Run 0: Run number 0 started with 25 threads. Run 0: Writing OpenFOAM files... Run 0: Converting DEM to STL format... Run 0: Transforming surface points to output wind height... Run 0: Generating mesh... Run 0: Running blockMesh... Run 0: Decomposing domain for parallel mesh calculations... Run 0: Running moveDynamicMesh... Run 0: (moveDynamicMesh) 4% complete... Run 0: (moveDynamicMesh) 12% complete... Run 0: (moveDynamicMesh) 20% complete... Run 0: (moveDynamicMesh) 26% complete... Run 0: (moveDynamicMesh) 34% complete... Run 0: (moveDynamicMesh) 42% complete... Run 0: (moveDynamicMesh) 48% complete... Run 0: (moveDynamicMesh) 56% complete... Run 0: (moveDynamicMesh) 62% complete... Run 0: (moveDynamicMesh) 70% complete... Run 0: (moveDynamicMesh) 78% complete... Run 0: (moveDynamicMesh) 84% complete... Run 0: (moveDynamicMesh) 92% complete... Run 0: (moveDynamicMesh) 98% complete... Run 0: Reconstructing domain... Run 0: Refining surface cells in mesh... Run 0: (refineMesh) 10% complete... Run 0: (refineMesh) 99% complete... Run 0: Renumbering mesh... Run 0: Applying initial conditions... Run 0: Decomposing domain for parallel flow calculations... Run 0: Solving for the flow field... Run 0 (solver): 2% complete Run 0 (solver): 2% complete Run 0 (solver): 2% complete



Notice that I am not an expert on this and this is simply a working solution I got by googling and playing a bit with the Docker.

PS: The Docker approach is important for us to run Azure HPC Batching with momentum conservation.  For our local HPC I will have to do some testing because we use **Singularity** and it is more restrictive with privileges ...  in case it fails maybe using openFoam image as base instead of Ubuntu 20.04 may be a solution (well..... just thinking out loud)
nwagenbrenner commented 1 year ago

@santiagoMonedero Glad you got something working for now. I'm also not a Docker or MPI expert so am not sure off the top of my head if there is a better way to handle this. I'll leave this ticket open and will try to take a closer look soon. Thanks for reporting your fix here.