INP-PM / FEDM

Finite Element Discharge Modelling code
https://inp-pm.github.io/FEDM/
GNU Lesser General Public License v3.0
10 stars 4 forks source link

Parallel computation #17

Closed RaphaelPile closed 1 year ago

RaphaelPile commented 1 year ago

I am now working on simulations based on the streamer example. For that purpose, I have a dedicated working station with 32 CPU , under Linux OS. Unfortunately, the default behavior of the code is to run 32 jobs (one for each CPU, with one master CPU), but the mesh is not decomposed as when using "mpirun –np ...".

However, if I run "mpirun –np N python3 fed-streamer.py", the number of jobs is N*32.

We have already used python with mpirun on this working station without any issue. So I wonder if there could be something in FENICS or FEDM code explaining this behaviour ?

Best,

Raphaël

markus-m-becker commented 1 year ago

Just to clearly understand the issue: executing something like python3 fedm-name_of_example.py results in 32 running processes and mpirun –np 8 python3 fedm-name_of_example.py in 8*32 processes? Do you observe this behaviour also for the original streamer benchmark example?

RaphaelPile commented 1 year ago

Yes and yes Raphaël PileOn 27 Apr 2023, at 10:48, markus @.**> wrote: Just to clearly understand the issue: executing something like python3 fedm-name_of_example.py results in 32 running processes and mpirun –np 8 python3 fedm-name_of_example.py in 832 processes? Do you observe this behaviour also for the original streamer benchmark example?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

markus-m-becker commented 1 year ago

Hm, as far as I know, code executed in Python is not expected to run on multiple threads without further ado (e.g. using multiprocessing or threading) and FEniCS should not run in parallel without using mpirun. Did you verify that the same behaviour (multithreading) is not observed for any other simple python programme on your machine?

AleksandarJ1984 commented 1 year ago

That is indeed quite interesting behaviour. I observed something similar (but not identical) when we used FEniCS on Anaconda. In those cases, some default PETSc settings would lead to using OpenMP and the duplicate number of used threads (see similar issue here). The solution was to run

export OMP_NUM_THREADS = 1

before your script. Another possibility is that there is a problem with your MPI setup.

RaphaelPile commented 1 year ago

I solved the issue with the proposition from here. export MKL_NUM_THREADS=1 export NUMEXPR_NUM_THREADS=1 export OMP_NUM_THREADS=1

I find it by googling for the command you provided, so many thank for your help !