Execution gets stuck at distrubuted matrix construction for PETSc

abaikin commented 2 years ago

Dear FreeFem developers,

I try to run the code, which is very close to example elasticity-2d-PETSc.edp. I use the folowing command to run script in VSCode mpiexec -n 6 FreeFem++-mpi.exe -wg -ne ${file}

However, with -n 6 it stops inside constructor(...) and cannot finish the script. With -n 5 it works well.

FreeFem version is 4.11 under win10 (FreeFEM-4.11-win64.exe installer)

Please, could you help me to solve the problem.

 load "PETSc"                       
 /* PETSc Prerequisites */
 macro dimension()2// EOM            // 2D or 3D

macro vectorialfe()P1// EOM
include "macro_ddm.idp"             // additional DDM functions

macro def(i)[i, i#B]// EOM          // vector field definition
macro init(i)[i, i]// EOM           // vector field initialization

/* Domain geometry */
real Lx = 2;
real Ly = 1;

int nx = 2*100;
real ny = nx/2;

/* Physical parameters */
real lambda = 2e9;
real mu = 3e9;

/* Mesh definition */
mesh Th = square(nx, ny, [Lx*x, Ly*y]);

func Pk = [vectorialfe, vectorialfe]; // finite element space
int split = 1;
int bsMat = 2; /* block size */

real[int] DMat; 

int[int][int] intersectionMat; 
build(Th, split, intersectionMat, DMat, Pk, mpiCommWorld, bsMat)

/* Create destributed matrix */
Mat A;

constructor(A, DMat.n, intersectionMat, DMat, bs = bsMat, communicator = mpiCommWorld);

The console output

 Load: lg_fem lg_mesh lg_mesh3 eigenvalue parallelempi
  -- Square mesh : nb vertices  =20301 ,  nb triangles = 40000 ,  nb boundary edges 600
  -- Square mesh : nb vertices  =20301 ,  nb triangles = 40000 ,  nb boundary edges 600
  -- Square mesh : nb vertices  =20301 ,  nb triangles = 40000 ,  nb boundary edges 600
(load: loadLibary C:\Program Files (x86)\FreeFem++\\.\PETSc = 0)(load: loadLibary C:\Program Files (x86)\FreeFem++\\.\metis = 0) load: init metis (v  5 )
 sizestack + 1024 =13552  ( 12528 )

  -- Square mesh : nb vertices  =20301 ,  nb triangles = 40000 ,  nb boundary edges 600
  -- Square mesh : nb vertices  =20301 ,  nb triangles = 40000 ,  nb boundary edges 600
  -- Square mesh : nb vertices  =20301 ,  nb triangles = 40000 ,  nb boundary edges 600

prj- commented 2 years ago

I cannot reproduce this on my machine (I just added a if(mpirank == 0) cout << "done!" << endl; at the end of your code). Do you have enough processes on your machine?

abaikin commented 2 years ago

@prj- Thank you for reply. Your fix withif(mpirank == 0) cout << "done!" << endl; works for this code only.

But if I add fespace VhLoc(Th, Pk); after it or add if(mpirank == 0) cout << "done!" << endl; just after the mesh creation mesh Th = square(nx, ny, [Lx*x, Ly*y]); it does not work again.

Without constructor(...) all works ok. Maybe in Win it corrupts runtime in some cases because if I set int nx = 2*10; not 2*100 it works.

I have 6 cores (i7-9750H) and btw example from tutorial works ( I also tried with -n 7 and -n 8)

prj- commented 2 years ago

Could you try to launch the script with the additional command line parameter -Dpartitioner=scotch?

prj- commented 2 years ago

Also, just FYI, you are using PETSc the "old" way. The "new" way is much cleaner and simpler. You could just replace:

int split = 1;
int bsMat = 2; /* block size */

real[int] DMat; 

int[int][int] intersectionMat; 
build(Th, split, intersectionMat, DMat, Pk, mpiCommWorld, bsMat)

/* Create destributed matrix */
Mat A;

constructor(A, DMat.n, intersectionMat, DMat, bs = bsMat, communicator = mpiCommWorld);

By:

Mat A;
createMat(Th, A, Pk);

Does that still hang with this change?

abaikin commented 2 years ago

@prj- I tried -Dpartitioner=scotch. In some cases it helps, but if I vary mesh size or add aditional code it gets stuck again.

I know about createMat. I tried it but the problem did not disappear. In my code I just copy the code from your macro createMat because eventually I need to create several distributed matrices with the same DDM connectivity.

Also, I noticed that on Ubuntu it works ok. The problem is only on Win10. Also, on Win10 the linear system solve using old MUMPS interface (via load "MUMPS") is 1.5-2 faster then using mumps via PETSc interface (see the code below). However, on Ubuntu 20.04 the solve times are nearly the same.

PETSc code

load "PETSc"                       

/* PETSc Prerequisites */
macro dimension()2// EOM            // 2D or 3D

/* Defining macros vectorialfe result in creation matrix with blocks (see bs). 
Each block of the same element type (ex. [P1,P1] not [P2,P1]) */
macro vectorialfe()P1// EOM
include "macro_ddm.idp"             // additional DDM functions

macro def(i)[i, i#B]// EOM          // vector field definition
macro init(i)[i, i]// EOM           // vector field initialization

/* Domain geometry */
real Lx = 2;
real Ly = 1;

int nx = 2*100;
real ny = nx/2;

/* Physical parameters */
real lambda = 2e9;
real mu = 3e9;

int labelB = 1;
int labelR = 2;
int labelT = 3;
int labelL = 4;

/* Exact solution */
func fx = exp(-x)*(lambda+mu)*(exp(x+y)*sin(x)-sin(y));
func fy = exp(-x)*(lambda+mu)*(-exp(x+y)*cos(x)+cos(y));

func uExact = sin(x)*exp(y);
func vExact = cos(y)*exp(-x);

func sigma11 = exp(y)*(lambda + 2*mu)*cos(x) - exp(-x)*lambda*sin(y);
func sigma12 = mu*(-exp(-x)*cos(y) + exp(y)*sin(x));
func sigma22 = exp(y)*lambda*cos(x) - exp(-x)*(lambda + 2*mu)*sin(y);

/* Mesh definition */
mesh Th = square(nx, ny, [Lx*x, Ly*y]);
if(mpirank == 0) cout << "done!" << endl;
func Pk = [vectorialfe, vectorialfe]; // finite element space
int split = 1;
int bsMat = 2; /* block size */

real[int] DMat; /* The diagonal for POU matrix, defined on (VhLoc x VhLoc) */

/* intersectionMat[0] = [rk_1, rk_2,...] - array of mpiranks that have intersection with mesh Th on current rank */
int[int][int] intersectionMat; /* intersectionMat[i], i > 0 = dofs of VhLoc that intersects with dofs on itersection of the ThLocal and (rk_i)-th process mesh */
build(Th, split, intersectionMat, DMat, Pk, mpiCommWorld, bsMat)
/* Create destributed matrix */
Mat A;

constructor(A, DMat.n, intersectionMat, DMat, bs = bsMat, communicator = mpiCommWorld);

if(mpirank == 0) cout << "done!" << endl;
/* Fespace definition */

fespace VhLoc(Th, Pk);

VhLoc [u,v];

/* Differential operators */
real sqrt2 = sqrt(2.);
macro epsilon(u,v)[dx(u), dy(v), (dy(u) + dx(v)) / sqrt2]// EOM
macro div(u,v)(dx(u) + dy(v))// EOM

// macro ThInt(a) a,mpirank//EOM
macro ThInt(a) a//EOM

/* Forms definition */
varf LameBilinearForm( [u,v], [uu, vv] ) = 
    int2d(Th)(
        lambda*div(u,v)*div(uu,vv)
        + 2*mu*epsilon(u,v)'*epsilon(uu,vv)
    )
    +
    on(labelB, labelR, labelL, v=vExact, u=uExact);

varf LameLinearForm( [u,v], [uu, vv] ) = 
    int2d(Th)(
        fx*uu + fy*vv
    )
    + int1d(Th, labelT)( /* n = (0,1) */
        (sigma12*uu + sigma22*vv)
    ) 
    +
    on(labelB, labelR,  labelL, v=vExact, u=uExact)
    ;

mpiBarrier(mpiCommWorld); 

/* Matrix and rhs creation */
real t0 = mpiWtime();
matrix M = LameBilinearForm(VhLoc, VhLoc);
real[int] rhs = LameLinearForm(0, VhLoc);
real t1 = mpiWtime();
A = M;
real t2 = mpiWtime();
set(A, sparams = "-ksp_type gmres -ksp_rtol 1e-7 -ksp_atol 1e-50 -ksp_max_it 3 -ksp_initial_guess_nonzero -ksp_view_final_residual  -ksp_converged_reason -pc_type lu -pc_factor_mat_sol mumps");
real t3 = mpiWtime();
/* Solving SLAE */
u[] = A^-1*rhs;
real t4 = mpiWtime();
macro def1(u)u// EOM
plotMPI(Th, u, vectorialfe, def1, real, cmm = "Global residual")

/* Compare with exact solution */

real uRelError = int2d(Th)(abs(u-uExact))/int2d(Th)(abs(uExact));
real vRelError = int2d(Th)(abs(v-vExact))/int2d(Th)(abs(vExact));

mpiBarrier(mpiCommWorld); 
cout << "uRelError[" << mpirank << "] = " << uRelError << endl
     << "vRelError = "  << mpirank << "] = " << vRelError << endl;
mpiBarrier(mpiCommWorld); 

cout 
    << "---------------------------------" << endl
    << "Matrix and rhs \t " << (t1-t0) << endl
    << "Convert to PetSC Matrix \t " << (t2-t1) << endl
    << "Set A \t " << (t3-t2) << endl
    << "Solve SLAE \t " << (t4-t3) << endl
    << "---------------------------------" << endl
    << "All \t" << (t4-t0) << endl
    ;

MUMPS Code

load "MUMPS"
load "metis"
include "mesh_utilities.edp"

/* Domain geometry */
real Lx = 2;
real Ly = 1;

int nx = 2*100;
real ny = nx/2;

/* Physical parameters */
real lambda = 2e9;
real mu = 3e9;

int labelB = 1;
int labelR = 2;
int labelT = 3;
int labelL = 4;

/* Exact solution */
func fx = exp(-x)*(lambda+mu)*(exp(x+y)*sin(x)-sin(y));
func fy = exp(-x)*(lambda+mu)*(-exp(x+y)*cos(x)+cos(y));

func uExact = sin(x)*exp(y);
func vExact = cos(y)*exp(-x);

func sigma11 = exp(y)*(lambda + 2*mu)*cos(x) - exp(-x)*lambda*sin(y);
func sigma12 = mu*(-exp(-x)*cos(y) + exp(y)*sin(x));
func sigma22 = exp(y)*lambda*cos(x) - exp(-x)*(lambda + 2*mu)*sin(y);

/* Mesh definition */
mesh Th = square(nx, ny, [Lx*x, Ly*y]);

partMeshIntoRegions(Th);

plot(Th);

/* Fespace definition */
fespace Vh(Th, [P1, P1]);

Vh [u,v];

/* Differential operators */
real sqrt2 = sqrt(2.);
macro epsilon(u,v) [dx(u), dy(v), (dx(v) + dy(u)) / sqrt2] // EOM 
macro div(u,v)( dx(u) + dy(v) ) // EOM

/* Forms definition */
varf LameBilinearForm( [u,v], [uu, vv] ) = 
    int2d(Th,mpirank)(
        lambda*div(u,v)*div(uu,vv)
        + 2*mu*epsilon(u,v)'*epsilon(uu,vv)
    )
    +
    on(labelB, labelR, labelL, v=vExact, u=uExact);

varf LameLinearForm( [u,v], [uu, vv] ) = 
    int2d(Th,mpirank)(
        fx*uu + fy*vv
    )
    + int1d(Th, labelT)( /* n = (0,1) */
        (sigma12*uu + sigma22*vv)*(region==mpirank)
    ) 
    +
    on(labelB, labelR,  labelL, v=vExact, u=uExact)
    ;

/* Matrix and rhs creation */
real t0 = mpiWtime();
matrix M = LameBilinearForm(Vh, Vh);
real[int] rhs = LameLinearForm(0, Vh);

real t1 = mpiWtime();
set(M, solver=sparsesolver, master=-1);
real t2 = mpiWtime();
/* Solving SLAE */
u[] = M^-1*rhs;
real t3 = mpiWtime();

plot(Th, v, fill=1, nbiso=40, value=1);

/* Compare with exact solution */
real uRelError = int2d(Th)(abs(u-uExact))/int2d(Th)(abs(uExact));
cout << "uRelError = " << uRelError << endl;

real vRelError = int2d(Th)(abs(v-vExact))/int2d(Th)(abs(vExact));
cout << "vRelError = " << vRelError << endl;

cout 
    << "---------------------------------" << endl
    << "Matrix and rhs \t " << (t1-t0) << endl
    << "Set M \t " << (t2-t1) << endl
    << "Solve SLAE \t " << (t3-t2) << endl
    << "---------------------------------" << endl
    << "All \t" << (t3-t0) << endl
    ;

prj- commented 2 years ago

If the problem only appears on your Windows machine, there is not much I can do without a direct access to it, as this problem seems specific to your installation. This option does not exist: -pc_factor_mat_sol mumps. If you care about performance, for linear elasticity, there are much faster alternatives than exact LU factorization (at least you should do a Cholesky factorization if you want to stick to a direct solver).

FreeFem / FreeFem-sources

Execution gets stuck at distrubuted matrix construction for PETSc #227