conda-forge / fenics-feedstock

A conda-smithy repository for fenics.
BSD 3-Clause "New" or "Revised" License
13 stars 12 forks source link

Issue with PETSc 3.20 in fenics.PETScDMCollection.create_transfer_matrix #192

Open sblauth opened 1 year ago

sblauth commented 1 year ago

Solution to issue cannot be found in the documentation.

Issue

Hi, I am having an issue using fenics.PETScDMCollection or rather the result of its method create_transfer_matrix. I have a MWE here:

from fenics import *

mesh = UnitSquareMesh(8,8)
V = FunctionSpace(mesh, 'CG', 1)
W = FunctionSpace(mesh, 'CG', 2)

transfer_matrix = as_backend_type(PETScDMCollection.create_transfer_matrix(V, W)).mat()
_, temp = transfer_matrix.getVecs()

Using this, I get a segmentation fault or PETSc error 98 (general MPI error, even though I am not using MPI to run the code).

Is this expected? What workaround is possible? I am using this functionality inside my package to create a reusable interpolation matrix.

Many thanks in advance.

Installed packages

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
asttokens                 2.4.0              pyhd8ed1ab_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                pyhd8ed1ab_3    conda-forge
backports.functools_lru_cache 1.6.5              pyhd8ed1ab_0    conda-forge
binutils_impl_linux-64    2.40                 hf600244_0    conda-forge
binutils_linux-64         2.40                 hbdbef99_2    conda-forge
blosc                     1.21.5               h0f2a231_0    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.20.1               hd590300_0    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cashocs                   2.1.0.dev0               pypi_0    pypi
certifi                   2023.7.22          pyhd8ed1ab_0    conda-forge
cftime                    1.6.2           py311h1f0f07a_2    conda-forge
cmake                     3.27.6               hcfe8598_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
eigen                     3.4.0                h00ab1b0_0    conda-forge
exceptiongroup            1.1.3              pyhd8ed1ab_0    conda-forge
executing                 1.2.0              pyhd8ed1ab_0    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
fenics                    2019.1.0        py311h26b4920_42    conda-forge
fenics-dijitso            2019.1.0        py311h38be061_38    conda-forge
fenics-dolfin             2019.1.0        py311hedc8a0f_42    conda-forge
fenics-ffc                2019.1.0        py311h38be061_38    conda-forge
fenics-fiat               2019.1.0        py311h38be061_38    conda-forge
fenics-libdolfin          2019.1.0            h5cac2fd_42    conda-forge
fenics-ufl                2019.1.0        py311h38be061_38    conda-forge
fftw                      3.3.10          mpi_mpich_h5537406_8    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freeimage                 3.18.0              h138f111_17    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
gcc_impl_linux-64         12.3.0               he2b93b0_2    conda-forge
gcc_linux-64              12.3.0               h76fc315_2    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
gmpy2                     2.1.2           py311h6a5fa03_1    conda-forge
gxx_impl_linux-64         12.3.0               he2b93b0_2    conda-forge
gxx_linux-64              12.3.0               h8a814eb_2    conda-forge
h5py                      3.10.0          nompi_py311h3839ddf_100    conda-forge
hdf4                      4.2.15               h501b40f_6    conda-forge
hdf5                      1.14.2          mpi_mpich_ha2c2bf8_0    conda-forge
hypre                     2.28.0          mpi_mpich_h716cb5e_0    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
imath                     3.1.9                hfc55251_0    conda-forge
importlib-metadata        6.8.0              pyha770c72_0    conda-forge
importlib_metadata        6.8.0                hd8ed1ab_0    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
ipython                   8.16.1             pyh0d859eb_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jxrlib                    1.1                  h7f98852_2    conda-forge
kernel-headers_linux-64   2.6.32              he073ed8_16    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
lcms2                     2.15                 h7f713cb_2    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libaec                    1.1.2                h59595ed_1    conda-forge
libblas                   3.9.0           18_linux64_openblas    conda-forge
libboost                  1.82.0               h6fcfa73_6    conda-forge
libboost-headers          1.82.0               ha770c72_6    conda-forge
libcblas                  3.9.0           18_linux64_openblas    conda-forge
libcurl                   8.3.0                hca28451_0    conda-forge
libdeflate                1.19                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-devel_linux-64     12.3.0               h8bca6fd_2    conda-forge
libgcc-ng                 13.2.0               h807b86a_2    conda-forge
libgfortran-ng            13.2.0               h69a702a_2    conda-forge
libgfortran5              13.2.0               ha4646dd_2    conda-forge
libgomp                   13.2.0               h807b86a_2    conda-forge
libhwloc                  2.9.3           default_h554bfaf_1009    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.1.5.1              hd590300_1    conda-forge
liblapack                 3.9.0           18_linux64_openblas    conda-forge
libnetcdf                 4.9.2           nompi_h80fb2b6_112    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnsl                    2.0.0                hd590300_1    conda-forge
libopenblas               0.3.24          pthreads_h413a1c8_0    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libraw                    0.21.1               h501b40f_1    conda-forge
libsanitizer              12.3.0               h0f45ef3_2    conda-forge
libsqlite                 3.43.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-devel_linux-64  12.3.0               h8bca6fd_2    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_2    conda-forge
libtiff                   4.6.0                h29866fb_1    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libuv                     1.46.0               hd590300_0    conda-forge
libwebp-base              1.3.2                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxml2                   2.11.5               h232c23b_1    conda-forge
libzip                    1.10.1               h2629f0a_3    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
mdurl                     0.1.2                    pypi_0    pypi
meshio                    5.3.4              pyhd8ed1ab_0    conda-forge
metis                     5.1.0             h59595ed_1007    conda-forge
mpc                       1.3.1                hfe3b2da_0    conda-forge
mpfr                      4.2.0                hb012696_0    conda-forge
mpi                       1.0                       mpich    conda-forge
mpi4py                    3.1.4           py311he01e52e_1    conda-forge
mpich                     4.1.2              h846660c_100    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
mumps-include             5.2.1               ha770c72_11    conda-forge
mumps-mpi                 5.2.1               h7ee95aa_11    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
netcdf4                   1.6.4           nompi_py311he8ad708_103    conda-forge
numpy                     1.26.0          py311h64a7726_0    conda-forge
occt                      7.6.3           novtk_h130ccc2_102    conda-forge
openexr                   3.2.1                h3f0fd8d_0    conda-forge
openjpeg                  2.5.0                h488ebb8_3    conda-forge
openssl                   3.1.3                hd590300_0    conda-forge
packaging                 23.2               pyhd8ed1ab_0    conda-forge
parmetis                  4.0.3             h2a9763c_1005    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
petsc                     3.20.0          real_h622a54c_101    conda-forge
petsc4py                  3.20.0          real_h928380f_100    conda-forge
pexpect                   4.8.0              pyh1a96a4e_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pip                       23.2.1             pyhd8ed1ab_0    conda-forge
pkg-config                0.29.2            h36c2ea0_1008    conda-forge
pkgconfig                 1.5.5              pyhd8ed1ab_4    conda-forge
pluggy                    1.3.0              pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.39             pyha770c72_0    conda-forge
prompt_toolkit            3.0.39               hd8ed1ab_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptscotch                  6.0.9                hb499603_2    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pybind11                  2.11.1          py311h9547e67_2    conda-forge
pybind11-global           2.11.1          py311h9547e67_2    conda-forge
pygments                  2.16.1             pyhd8ed1ab_0    conda-forge
pytest                    7.4.2              pyhd8ed1ab_0    conda-forge
python                    3.11.6          hab00c5b_0_cpython    conda-forge
python_abi                3.11                    4_cp311    conda-forge
rapidjson                 1.1.0             he1b5a44_1002    conda-forge
readline                  8.2                  h8228510_1    conda-forge
rhash                     1.4.4                hd590300_0    conda-forge
rich                      13.6.0             pyhd8ed1ab_0    conda-forge
scalapack                 2.2.0                hd931219_1    conda-forge
scipy                     1.11.3          py311h64a7726_1    conda-forge
scotch                    6.0.9                hb2e6521_2    conda-forge
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
slepc                     3.20.0          real_h905369a_100    conda-forge
slepc4py                  3.20.0          real_hd877bb9_100    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
suitesparse               5.10.1               h9e50725_1    conda-forge
superlu                   5.2.2                h00795ac_0    conda-forge
superlu_dist              7.2.0                h25dcc4a_0    conda-forge
sympy                     1.12            pypyh9d50eac_103    conda-forge
sysroot_linux-64          2.12                he073ed8_16    conda-forge
tbb                       2021.10.0            h00ab1b0_1    conda-forge
tk                        8.6.13               h2797004_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
traitlets                 5.11.2             pyhd8ed1ab_0    conda-forge
typing_extensions         4.8.0              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
wcwidth                   0.2.8              pyhd8ed1ab_0    conda-forge
wheel                     0.41.2             pyhd8ed1ab_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.6                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxt                1.3.0                hd590300_1    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

Environment info

active environment : petsc320
    active env location : /p/tv/local/miniconda3_blauths/envs/petsc320
            shell level : 2
       user config file : /u/b/blauths/.condarc
 populated config files : /u/b/blauths/.condarc
          conda version : 23.3.1
    conda-build version : not installed
         python version : 3.10.8.final.0
       virtual packages : __archspec=1=x86_64
                          __cuda=11.4=0
                          __glibc=2.17=0
                          __linux=3.10.0=0
                          __unix=0=0
       base environment : /p/tv/local/miniconda3_blauths  (writable)
      conda av data dir : /p/tv/local/miniconda3_blauths/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /p/tv/local/miniconda3_blauths/pkgs
                          /u/b/blauths/.conda/pkgs
       envs directories : /p/tv/local/miniconda3_blauths/envs
                          /u/b/blauths/.conda/envs
               platform : linux-64
             user-agent : conda/23.3.1 requests/2.31.0 CPython/3.10.8 Linux/3.10.0-1160.95.1.el7.x86_64 rhel/7.9 glibc/2.17
                UID:GID : 132041:513
             netrc file : None
           offline mode : False
minrk commented 1 year ago

lldb gives this stack:

  * frame #0: 0x000000010c8f2b79 libpetsc.3.20.0.dylib`PetscOptionsFindPair + 153
    frame #1: 0x000000010c8f7595 libpetsc.3.20.0.dylib`PetscOptionsDeprecated_Private + 117
    frame #2: 0x000000010c8c3ee0 libpetsc.3.20.0.dylib`PetscInfoProcessClass + 80
    frame #3: 0x000000010cec8e5c libpetsc.3.20.0.dylib`MatMFFDInitializePackage + 156
    frame #4: 0x000000010ced3a2d libpetsc.3.20.0.dylib`MatInitializePackage + 61
    frame #5: 0x000000010cf1b5fd libpetsc.3.20.0.dylib`MatCreate + 45
    frame #6: 0x000000010a830ba5 libdolfin.2019.1.0.dylib`dolfin::PETScDMCollection::create_transfer_matrix(dolfin::FunctionSpace const&, dolfin::FunctionSpace const&) + 13621

tracing it to MatCreate called here, which ultimately calls MatMFFDInitializePackage -> PetscInfoProcessClass

and finally the segfault occurs here:

PetscCall(PetscOptionsDeprecated_Private(NULL, "-info_exclude", NULL, "3.13", "Use ~ with -info to indicate classes to exclude"));

which calls:

PetscCall(PetscOptionsFindPair(options, prefix, oldname, &value, &found));

with *prefix=NULL, options=NULL, oldname="-info_exclude"

So the segfault is somewhere in PetscOptionsFindPair. I can reproduce this with the simple C program:

#include "petsc.h"

int main() {
  PetscBool         found;
  const char*       value;
  char             *prefix  = NULL;
  PetscOptions      options = NULL;
  const char oldname[] = "-info_exclude";

  PetscCall(PetscOptionsFindPair(options, prefix, oldname, &value, &found));
  return 0;
}
minrk commented 1 year ago

I think I tracked it down to a missing PetscOptionsCreateDefault();, so the default options are undefined. Not sure whose responsibility that is.

minrk commented 1 year ago

Ultimately, I think there's a missing call to PetscInitialize, which avoids this segfault. I don't know what changed in petsc 3.20 that caused this, but adding:

SubSystemsManager.init_petsc()

appears to be a workaround to ensure petsc is initialized. I don't understand enough to say precisely when that should be called in fenics itself, but it seems to be missing somehow.

minrk commented 1 year ago

Ultimately, the call that's failing is this one, failing with "invalid communicator". I don't really know what's going on there, but presumably it's something in initialization and/or passing around of the mpi comms.

PetscDolfinErrorHandler: line '208', function 'PetscCommDuplicate', file '/Users/runner/miniforge3/conda-bld/petsc_1696941223631/work/src/sys/objects/tagm.c',
                       : error code '98' (General MPI error), message follows:
------------------------------------------------------------------------------
MPI error 70865157 Invalid communicator, error stack:
                MPII_Comm_get_attr(85): MPI_Comm_get_attr(comm=0x84000003, keyval=0xa4400000, attribute_val=0x305e76450, flag=0x305e76444) failed
                MPII_Comm_get_attr(58): Invalid communicator
------------------------------------------------------------------------------
PetscDolfinErrorHandler: line '51', function 'PetscHeaderCreate_Private', file '/Users/runner/miniforge3/conda-bld/petsc_1696941223631/work/src/sys/objects/inherit.c',
                       : error code '98' (General MPI error), message follows:
------------------------------------------------------------------------------

------------------------------------------------------------------------------
PetscDolfinErrorHandler: line '26', function 'PetscHeaderCreate_Function', file '/Users/runner/miniforge3/conda-bld/petsc_1696941223631/work/src/sys/objects/inherit.c',
                       : error code '98' (General MPI error), message follows:
------------------------------------------------------------------------------

------------------------------------------------------------------------------
PetscDolfinErrorHandler: line '101', function 'VecCreateWithLayout_Private', file '/Users/runner/miniforge3/conda-bld/petsc_1696941223631/work/src/vec/vec/interface/veccreate.c',
                       : error code '98' (General MPI error), message follows:
------------------------------------------------------------------------------

------------------------------------------------------------------------------
PetscDolfinErrorHandler: line '9468', function 'MatCreateVecs', file '/Users/runner/miniforge3/conda-bld/petsc_1696941223631/work/src/mat/interface/matrix.c',
                       : error code '98' (General MPI error), message follows:
------------------------------------------------------------------------------

Debug with:

# from mpi4py import MPI
# from mpi4py import MPI
# import petsc4py
# # 
# petsc4py.init()  #["-log_view", "-log_trace", "trace.txt"])#, "-on_error_attach_debugger"])
# from petsc4py import PETSc

import dolfin
from dolfin import UnitSquareMesh, FunctionSpace, as_backend_type, PETScDMCollection, SubSystemsManager
dolfin.cpp.log.set_log_level(dolfin.cpp.log.LogLevel.DEBUG)
SubSystemsManager.init_petsc()

mesh = UnitSquareMesh(comm=MPI.COMM_WORLD, nx=8, ny=8)
V = FunctionSpace(mesh, 'CG', 1)
W = FunctionSpace(mesh, 'CG', 2)

transfer_matrix = as_backend_type(PETScDMCollection.create_transfer_matrix(V, W)).mat()
print(transfer_matrix)
_, temp = transfer_matrix.createVecs()
print(temp)
sblauth commented 1 year ago

I've tried your code and can only reproduce the exact same behavior - any idea how to fix this? Or is this rather the area of expertise of the PETSc developers?

minrk commented 1 year ago

I'm not sure who is going to know why the MPI communicator is invalid. Clearly something has changed in the initialization somewhere in 3.20, but I can't figure out what, since it's passed around so much.

My hunch is that the communicator handle that fenics grabs by default is not quite right, and maybe the petsc communicator pointer or struct changed, or something like that. Or that an object is reinitialized, but used after being replaced.

The workaround for now is to pin petsc=3.19 in your env to avoid getting the problematic builds.

minrk commented 1 year ago

So I've been trying to further explore the error, and I'm still rather confused. I have discovered that all of fenics' own transfer matrix tests pass with the new builds, so I'm wondering if there's perhaps some error in how createVecs is being called, or under what circumstances it's safe?

In any case, the stack is:

and the problem appears to be in the value of mat->cmap->comm. I'm not quite sure how to debug what the value of that is or what it should be.

sblauth commented 1 year ago

I have a small update for this issue. This appears to be also happening for other versions of PETSc. I just encountered the same issue on my workstation where I use PETSc 3.17.4. There, the same error (resulting in a segfault) appears for some mesh I am working with. Unfortunately I cannot share the mesh due to confidentiality reasons.

Strangely, the error occurs infrequently. For example, the segfault happens when I use a VectorFunctionSpace, but not with a scalar FunctionSpace, and it also works for some "DG" elements - so it seems to be very inconsistent.

I could not fix the issue by either calling

import petsc4py
petsc4py.init()

from petsc4py import PETSc

or calling

fenics.SubSystemsManager.init_petsc()

Do you have any idea why this fails? Is this to be (somewhat) expected? Is there any workaround for this issue? (I have seen your posts on the fenics discord and the PETSc gitlab, so also many thanks for investigating this issue further).

keiyamamo commented 11 months ago

Dear @sblauth,

Not sure if this is useful, but my workaround for this problem is to specify the version of mpi4py and hdf5 as follows. It is not an ideal solution, but it seems to avoid the problem with create_transfer_matrix in our software.

  - hdf5=1.12.2
  - mpi4py=3.1.4

Best, Kei

sblauth commented 4 months ago

Regarding the error: I have noticed that I don't need to call createVecs at all. And apparently, the matrix itself contains the correct data, so that I can do with it what I want (namely multiply it with an existing vec). So this is, of course, still problematic, but as long as one does not need the createVecs, it works.