AMReX-Combustion / PeleLMeX

An adaptive mesh hydrodynamics simulation code for low Mach number reacting flows without level sub-cycling.
https://amrex-combustion.github.io/PeleLMeX/
BSD 3-Clause "New" or "Revised" License
29 stars 40 forks source link

Multiple processors issue #195

Closed bazharz closed 1 year ago

bazharz commented 1 year ago

Hello everyone,

I'm new to using PeleLMeX and I'm trying to simulate the flamesheet model using multiple processors. However, I've noticed that some of the files generated during the simulation have names like 'plt00000.old.0782200/', which is causing issues when I try to visualize the results using visIt.

Does anyone have any advice or best practices for avoiding these kinds of issues when working with multiple processors? Any guidance you can offer would be greatly appreciated.

Thank you!

drummerdoc commented 1 year ago

Those are old plot files that are renamed rather than removed. If you re-run a time step and don't want to see those, you'll need to manually remove them yourself prior to the run. This is done to avoid the code potentially overwriting valuable data, but can seem annoying. The same thing is done with checkpoint files.

bazharz commented 1 year ago

Thank you for responding!

In my situation, the plt files are not old as I have already removed them. However, when I run the command line with mpirun or mpiexec and specify the number of processors, I notice that for each step, I have n-1 old files saved. I suspect that there is an output generated for each processor, which is causing the issue I am experiencing.

drummerdoc commented 1 year ago

Ahh, did you build with USE_MPI=TRUE? If not, then you are running a bunch of serial codes at once and overwriting files on the fly. Your executable should have .MPI in its name.

bazharz commented 1 year ago

Yes, I built the executable with USE_MPI=TRUE

esclapez commented 1 year ago

It is also possible that the mpi wrapper used to compile the code is not from the same library as the mpirun used for the simulation. Which OS are you running on ? The standard PeleLMeX log should tell you how many MPI ranks are employed during the AMReX initialization: MPI initialized with 4 MPI processes

bazharz commented 1 year ago

I am running ubuntu22.04.2.LTS OS However, I can't seem to determine whether AMREX is initialized with 4 processors. How can I verify this step? Thank you in advance.

esclapez commented 1 year ago

It should be the first line of PeleLMeX stdout. The first few lines look like:

MPI initialized with 1 MPI processes
MPI initialized with thread support level 0
Initializing SUNDIALS with 1 threads...
SUNDIALS initialized.
AMReX (23.04-8-gbacaa10a7636) initialized
Successfully read inputs file ... 

 ================= Build infos =================
 PeleLMeX    git hash: v23.03-16-g9fbe4ef-dirty
 AMReX       git hash: 23.04-8-gbacaa10a7
 PelePhysics git hash: v23.03-9-gb04c8ee8
 AMReX-Hydro git hash: 38ad308
 ===============================================
esclapez commented 1 year ago

Since you're on linux, you can check which MPI library the PeleLMeX exec is linked to using ldd PeleLMeX*.exe, and which one your mpirun command point to using which and ls -l.

bazharz commented 1 year ago

For the first few lines it looks like: MPI initialized with 1 MPI processes MPI initialized with thread support level 0 MPI initialized with 1 MPI processes MPI initialized with thread support level 0 MPI initialized with 1 MPI processes MPI initialized with thread support level 0 Initializing SUNDIALS with 1 threads... SUNDIALS initialized. AMReX (23.02-42-gd8e6401642ff) initialized Successfully read inputs file ...

================= Build infos ================= PeleLMeX git hash: v23.02-4-g54f7076-dirty AMReX git hash: 23.02-42-gd8e640164 PelePhysics git hash: v0.1-1076-gb810e31c AMReX-Hydro git hash: 5cb0f33

Here in this case for example I used 4 processors. And for the mpi library i use this one: libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40 (0x00007fe5d1416000)

It seems to me that because mpirun is configured to use only one processor and the parallelization is designed for a single processor, that's why I have an outdated file for the previously chosen number of n-1 processors.

esclapez commented 1 year ago

So you are definitely running multiple serial PeleLMeX simulations instead of a single parallel one. Did you try using mpiexec instead of mpirun ?

bazharz commented 1 year ago

I tried both mpirun and mpiexec and i had the same result!

esclapez commented 1 year ago

Did you install both open-mpi and mpich at any point on your machine ?

bazharz commented 1 year ago

They were already installed in the usr directory!

esclapez commented 1 year ago

Are you able to explicitly use one or the other by overwriting/reinstalling the library ? My best guess is still that the default mpirun is not from the same library as the one used by the compiler MPI wrapper.

esclapez commented 1 year ago

@bazharz did you manage to get an MPI run going ?

bazharz commented 1 year ago

Yes it works now! Thank you

esclapez commented 1 year ago

I'll close the issue then.