firedrakeproject / firedrake

Firedrake is an automated system for the portable solution of partial differential equations using the finite element method (FEM)
https://firedrakeproject.org
Other
497 stars 157 forks source link

INSTALL: PETSc SF doesn't handle large count MPI datatypes #3718

Open JDBetteridge opened 1 month ago

JDBetteridge commented 1 month ago

Describe the error Currently PETSc SF doesn't handle custom MPI types made using the large count datatypes routines, which mpi4py==4.0 uses by default. The following error occurs:

petsc4py.PETSc.Error: error code 98
[0] PetscSFReduceBegin() at ...petsc/src/vec/is/sf/interface/sf.c:1578
[0] PetscSFReduceBegin_Basic() at ...petsc/src/vec/is/sf/impls/basic/sfbasic.c:400
[0] PetscSFLeafToRootBegin_Basic() at ...petsc/src/vec/is/sf/impls/basic/sfbasic.c:387
[0] PetscSFLinkCreate() at ...petsc/src/vec/is/sf/impls/basic/sfpack.c:434
[0] PetscSFLinkCreate_MPI() at ...petsc/src/vec/is/sf/impls/basic/sfmpi.c:121
[0] PetscSFLinkSetUp_Host() at ...petsc/src/vec/is/sf/impls/basic/sfpack.c:518
[0] MPIPetsc_Type_compare_contig() at ...petsc/src/vec/is/sf/interface/sftype.c:148
[0] MPIPetsc_Type_unwrap() at ...petsc/src/vec/is/sf/interface/sftype.c:46
[0] General MPI error
[0] MPI error 671789059 Invalid datatype, error stack:
                internal_Type_get_envelope(37636): MPI_Type_get_envelope(dtype=USER<contig>, num_integers=0x7ffc6dbff870, num_addresses=0x7ffc6dbff874, num_datatypes=0x7ffc6dbff878, combiner=0x7ffc6dbff
87c) failed
                MPIR_Type_get_envelope_impl(149).: use MPI_Type_get_envelope_c to query large count datatype

Steps to Reproduce MFE:

import numpy as np

from mpi4py import MPI
from petsc4py import PETSc

typedict = MPI._typedict
basetype = typedict['d']
newtype = basetype.Create_contiguous(8)
newtype.Commit()

source_vec = PETSc.Vec().createWithArray(np.array([1/ii for ii in range(1,101)]))
target_vec = PETSc.Vec().create()
target_vec.setSizes(100)
target_vec.setUp()

sf = PETSc.SF().create()

source_arr = source_vec.getArray()
target_arr = target_vec.getArray()

sf.reduceBegin(
    newtype,
    source_arr,
    target_arr,
    MPI.REPLACE,
)
sf.reduceEnd(
    newtype,
    source_arr,
    target_arr,
    MPI.REPLACE,
)

For now we will pin mpi4py==3.1.6 to avoid this error. This issue is mainly here to document the error.

JDBetteridge commented 1 month ago

Corresponding PETSc issue: https://gitlab.com/petsc/petsc/-/issues/1625