firedrakeproject / firedrake

Firedrake is an automated system for the portable solution of partial differential equations using the finite element method (FEM)
https://firedrakeproject.org
Other
498 stars 157 forks source link

Bug: inject_kernel setup phase crashes in 3D hMG preconditioner #3418

Closed Alexey-Voronin closed 6 months ago

Alexey-Voronin commented 7 months ago

I'm experiencing crashes during the setup phase of a monolithic h-multigrid solver for the 3D lid-driven cavity problem using Taylor-Hood elements. This occurs when solving with polynomial approximation degrees $k$ in the range $[2,10]$ for $Pk/P{k-1}$​ elements, specifically at $k=8$. Below are the steps to reproduce the issue for $k=8$ and the error snapshot.
Changing the mesh size (N) does not change the outcome.

Steps to Reproduce python code.py

from firedrake import *
from firedrake.petsc import PETSc

distribution_parameters = { }

shared_params = {
 'ksp_monitor' : None,
 'ksp_atol': 1e-9,
 'ksp_rtol': 1e-9,
 'ksp_error_if_not_converged': False,
 'ksp_max_it': 10,
 'ksp_type': 'fgmres',

 'snes_convergence_test': 'skip',
 'ksp_convergence_test': 'skip',
 'snes_max_it': 1,
 'snes_type': 'ksponly',
}

rlx_params = {
'ksp_chebyshev_esteig': '0,0.125,0,1.1',
 'ksp_convergence_test': 'skip',
 'ksp_max_it': 4,
 'ksp_type': 'chebyshev',
 'pc_python_type': 'phmg.preconditioners.rlx_params.ASMVankaStarPC',
 'pc_type': 'python',
 'pc_vankastar_construct_dim': 0,
 'pc_vankastar_exclude_subspaces': '1',
 'pc_vankastar_sub_sub_pc_factor_shift_type': 'nonzero',
}

mg_coarse_params = {
'assembled': {
                "pc_type": "lu",
                "pc_factor_mat_solver_type": "mumps",
                 },
'ksp_type': 'preonly',
'mat_type': 'aij',
'pc_python_type': 'firedrake.AssembledPC',
'pc_type': 'python'}

hmg_params = {
 **shared_params,
 'pc_type': 'mg',
 'mg_coarse': mg_coarse_params,
 'mg_levels': rlx_params
}

N = 3
hexahedral = False
for degree in range(9, 10):
    base = BoxMesh(N, N, N, 2, 2, 2, hexahedral=hexahedral, distribution_parameters=distribution_parameters)
    M = MeshHierarchy(base,1)[-1]

    variant = "spectral"
    Eu = FiniteElement("CG", M.ufl_cell(), degree, variant=variant)
    Ep = FiniteElement("CG", M.ufl_cell(), degree-1, variant=variant)

    V = VectorFunctionSpace(M, Eu)
    W = FunctionSpace(M, Ep)
    Z = V * W

    u, p = TrialFunctions(Z)
    v, q = TestFunctions(Z)

    a = (inner(grad(u), grad(v)) - inner(p, div(v)) + inner(div(u), q))*dx
    L = inner(Constant((0, 0, 0)), v) * dx

    def driver(domain):
        (x, y, z) = SpatialCoordinate(domain)
        driver = as_vector([x*x*(2-x)*(2-x)*z*z*(2-z)*(2-z)*(0.25*y*y),
                            0, 0])
        return driver

    bcs = [DirichletBC(Z.sub(0), driver(Z.ufl_domain()), 4),
                   DirichletBC(Z.sub(0), Constant((0., 0., 0.)), [1, 2, 3, 5, 6])]

    up = Function(Z)
    nullspace = MixedVectorSpaceBasis(
        Z, [Z.sub(0), VectorSpaceBasis(constant=True)])

    solve(a == L, up, bcs=bcs, nullspace=nullspace,
          solver_parameters=hmg_params
          )

Expected behavior Expected convergence in around 10-11 iterations

Error message

[0]PETSC ERROR:  
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-firedrake_0_ksp_atol value: 1e-09 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_ksp_convergence_test value: skip source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_ksp_error_if_not_converged value: false source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_ksp_max_it value: 10 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_ksp_monitor (no value) source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_ksp_rtol value: 1e-09 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_ksp_type value: fgmres source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mat_mumps_icntl_14 value: 200 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mat_type value: aij source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_coarse_assembled_pc_factor_mat_solver_type value: mumps source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_coarse_ksp_type value: preonly source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_coarse_mat_type value: aij source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_coarse_pc_python_type value: firedrake.AssembledPC source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_coarse_pc_type value: python source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_pc_composite_pcs value: python,python,python source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_pc_composite_type value: additive source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_pc_type value: composite source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_patch_pc_patch_construct_dim value: 0 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_patch_pc_patch_construct_type value: vanka source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_patch_pc_patch_exclude_subspaces value: 1 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_patch_pc_patch_local_type value: additive source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_patch_pc_patch_partition_of_unity value: false source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_patch_pc_patch_save_operators value: true source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_patch_pc_patch_sub_mat_type value: seqdense source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_patch_sub_ksp_type value: preonly source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_patch_sub_pc_factor_shift_type value: nonzero source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_patch_sub_pc_type value: lu source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_0_pc_python_type value: firedrake.PatchPC source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_patch_pc_patch_construct_dim value: 1 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_patch_pc_patch_construct_type value: vanka source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_patch_pc_patch_exclude_subspaces value: 1 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_patch_pc_patch_local_type value: additive source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_patch_pc_patch_partition_of_unity value: false source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_patch_pc_patch_save_operators value: true source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_patch_pc_patch_sub_mat_type value: seqdense source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_patch_sub_ksp_type value: preonly source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_patch_sub_pc_factor_shift_type value: nonzero source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_patch_sub_pc_type value: lu source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_1_pc_python_type value: firedrake.PatchPC source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_patch_pc_patch_construct_dim value: 2 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_patch_pc_patch_construct_type value: vanka source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_patch_pc_patch_exclude_subspaces value: 1 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_patch_pc_patch_local_type value: additive source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_patch_pc_patch_partition_of_unity value: false source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_patch_pc_patch_save_operators value: true source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_patch_pc_patch_sub_mat_type value: seqdense source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_patch_sub_ksp_type value: preonly source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_patch_sub_pc_factor_shift_type value: nonzero source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_patch_sub_pc_type value: lu source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_mg_levels_sub_2_pc_python_type value: firedrake.PatchPC source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_pc_type value: mg source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_snes_convergence_test value: skip source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_snes_max_it value: 1 source: code
[0]PETSC ERROR:   Option left: name:-firedrake_0_snes_type value: ksponly source: code
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.4.2-43642-g09f36907a6  GIT Date: 2023-12-08 19:37:01 +0000
[0]PETSC ERROR: stokes_3d.py on a default named alexeys-mbp.lan by alexey Mon Feb 19 15:48:44 2024
[0]PETSC ERROR: Configure options PETSC_DIR=/Users/alexey/firedrake/src/petsc PETSC_ARCH=default --with-debugging=0 --download-netcdf --download-mpich-configure-arguments=--disable-opencl --download-mpich --download-suitesparse --CFLAGS=-Wno-implicit-function-declaration --download-chaco --with-fortran-bindings=0 --download-hdf5 --download-mumps --download-openblas --LDFLAGS=-Wl,-ld_classic --download-scalapack --with-zlib --with-shared-libraries=1 --download-bison --download-superlu_dist --download-ptscotch --download-pnetcdf --download-hypre --with-x=0 --download-pastix --download-hwloc-configure-arguments=--disable-opencl --download-openblas-make-options="'USE_THREAD=0 USE_LOCKING=1 USE_OPENMP=0'" --with-c2html=0 --download-hwloc --download-metis
[0]PETSC ERROR: #1 DMCoarsen() at /Users/alexey/firedrake/src/petsc/src/dm/interface/dm.c:3291
[0]PETSC ERROR: #2 PCSetUp_MG() at /Users/alexey/firedrake/src/petsc/src/ksp/pc/impls/mg/mg.c:961
[0]PETSC ERROR: #3 PCSetUp() at /Users/alexey/firedrake/src/petsc/src/ksp/pc/interface/precon.c:1079
[0]PETSC ERROR: #4 KSPSetUp() at /Users/alexey/firedrake/src/petsc/src/ksp/ksp/interface/itfunc.c:415
[0]PETSC ERROR: #5 KSPSolve_Private() at /Users/alexey/firedrake/src/petsc/src/ksp/ksp/interface/itfunc.c:836
[0]PETSC ERROR: #6 KSPSolve() at /Users/alexey/firedrake/src/petsc/src/ksp/ksp/interface/itfunc.c:1083
[0]PETSC ERROR: #7 SNESSolve_KSPONLY() at /Users/alexey/firedrake/src/petsc/src/snes/impls/ksponly/ksponly.c:49
[0]PETSC ERROR: #8 SNESSolve() at /Users/alexey/firedrake/src/petsc/src/snes/interface/snes.c:4659
Traceback (most recent call last):
  File "/Users/alexey/firedrake/src/firedrake/firedrake/mg/kernels.py", line 394, in inject_kernel
    return cache[key]
           ~~~~~^^^^^
KeyError: ('inject', 1, 3, (0, (0,), (1,), (2,), (3,)), (1, (4, 5, 6, 7, 8, 9, 10, 11), (12, 13, 14, 15, 16, 17, 18, 19), (20, 21, 22, 23, 24, 25, 26, 27), (28, 29, 30, 31, 32, 33, 34, 35), (36, 37, 38, 39, 40, 41, 42, 43), (44, 45, 46, 47, 48, 49, 50, 51)), (2, (52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79), (80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107), (108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135), (136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163)), (3, (164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219)), (0, (0,), (1,), (2,), (3,)), (1, (4, 5, 6, 7, 8, 9, 10, 11), (12, 13, 14, 15, 16, 17, 18, 19), (20, 21, 22, 23, 24, 25, 26, 27), (28, 29, 30, 31, 32, 33, 34, 35), (36, 37, 38, 39, 40, 41, 42, 43), (44, 45, 46, 47, 48, 49, 50, 51)), (2, (52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79), (80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107), (108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135), (136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163)), (3, (164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219)), (0, (0,), (1,), (2,), (3,)), (1, (), (), (), (), (), ()), (2, (), (), (), ()), (3, ()), (0, (0,), (1,), (2,), (3,)), (1, (), (), (), (), (), ()), (2, (), (), (), ()), (3, ()))

Environment:

Status of components:

|Package |Branch |Revision |Modified |

|FInAT |master |e2805c4 |False | |PyOP2 |master |fbde61f9 |False | |fiat |master |e7b2909 |False | |firedrake |master |c5e939dde |False | |h5py |firedrake |4c01efa9 |False | |libspatialindex |master |4768bf3 |True | |libsupermesh |master |dbe226b |False | |loopy |main |8158afdb |False | |petsc |firedrake |09f36907a6|False | |pyadjoint |master |f194553 |False | |pytest-mpi |main |a478bc8 |False | |tsfc |master |799191d |False | |ufl |master |054b0617 |False |

connorjward commented 7 months ago

I've just taken a look at this. I think the key error message I found was:

Traceback (most recent call last):
  File "/home/connor/Code/firedrake-dev1/src/tsfc/gem/node.py", line 226, in __call__
    return self.cache[cache_key]

This shows us that the crash is happening inside TSFC and so I think our form compiler is struggling with something to do with your form/discretisation. I tried increasing this line by a factor of 100 (overkill) and it yielded a new error:

ModuleNotFoundError: No module named 'phmg'

I think it's unrelated so I suspect that increasing the stack limit might get your code to work.

Can you try increasing sys.setrecursionlimit and let us know if that stops the crash? It naturally is still an issue that we are doing so much recursion in our code generation, but at least it will verify where the problem is.

Alexey-Voronin commented 6 months ago

Upping the recursion limit fixed the problem. Thanks!

Alexey-Voronin commented 6 months ago

resolved