[BUG]: MUMPS solver in parallel produces wrong solution

physicsmonk commented 1 year ago

How to reproduce the bug

I ran the first demo demo_poisson.py in parallel on more than 1 core with a change of the linear solver to MUMPS. This will produce a solution different from that produced by MUMPS run on 1 core or by other linear solvers such as SUPERLU_DIST and GMRES run on multiple cores.

Minimal Example (Python)

No response

Output (Python)

No response

Version

0.6.0

DOLFINx git commit

24f86a9ce57df6978070dbee22b3eae8bb77235f

Installation

I installed DOLFINx from source on a MacBook Pro with a M1 Pro chip. The PETSc was configured by the command

./configure PETSC_ARCH=arch-darwin-opt --with-debugging=0 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-triangle --download-ctetgen --download-metis --download-parmetis --download-suitesparse --download-mumps --download-hypre --with-hwloc-dir=/opt/homebrew/opt/hwloc --download-superlu --download-superlu_dist --download-scalapack --download-spai --with-hdf5-dir=/opt/homebrew/opt/hdf5-mpi --download-fftw --with-clean

Additional information

Solution produced by MUMPS with 1 core:

Solution produced by MUMPS with 2 cores: Left mesh partition:

Right mesh partition:

Note that the solutions in these two cases on the left mesh partition are different.

nate-sime commented 1 year ago

I cannot reproduce this. Which version of PETSc are you using and how was it installed? Simply from source?

physicsmonk commented 1 year ago

Thanks for helping. I am using the newest PETSc 3.19.4 and built it from source. The configuration used for PETSc was shown in my original post. So your MUMPS solver working in parallel produced consistent and correct result for demo_poisson.py?

nate-sime commented 1 year ago

Can you reproduce this with a Docker container which encapsulates your desired environment? Also with the specific code that you're running since the poisson demo doesn't explicitly use MUMPS unless directed.

physicsmonk commented 1 year ago

Sorry I am not familiar with Docker, and I don't know if Docker DOLFINx has MUMPS incorporated. Actually I also found that MUMPS in old DOLFIN (built from source) seemed buggy... I even tried building MUMPS from source myself and linked it to PETSc, but still had the same issue. But currently I can use other direct linear solvers like SUPERLU_DIST.

I have another small question: what is the scatter_forward() method of the degrees-of-freedom vector for? I saw it is called a lot in the demos and searched a lot about it but could not find a good answer. Seems related to MPI scattering? Thanks for answering!

nate-sime commented 1 year ago

Please post general questions about FEniCS and its use to https://fenicsproject.discourse.group/. I'll mark this as "can't produce" until an environment exhibiting the issue is availabe.

jorgensd commented 1 year ago

Actually I also found that MUMPS in old DOLFIN (built from source) seemed buggy... I even tried building MUMPS from source myself and linked it to PETSc, but still had the same issue. But currently I can use other direct linear solvers like SUPERLU_DIST

There is nothing special about the mumps setup in DOLFINx. It is built as: https://github.com/FEniCS/dolfinx/blob/main/docker/Dockerfile.test-env#L208-L224 and you are working directly with PETSc objects.

WRT scatter forward, it is to update Ghost degrees of freedom (ghosts shared between processes). This happens for instance after solving a problem with petsc, as petsc only updates ghosts owned by the process. See for instance: https://scientificcomputing.github.io/mpi-tutorial/notebooks/dolfinx_MPI_tutorial.html#dolfinx-functions

FEniCS / dolfinx