Closed akhmerov closed 1 year ago
Just to confirm, this is what you are talking about:
../../micromamba/envs/kwant/lib/python3.11/site-packages/kwant/linalg/tests/test_mumps.py::test_schur_complement_with_dense double free or corruption (out)
I captured a bit more detail here:
$ pytest -v -s -k test_schur_complement_with_dense (kwant)
==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.11.4, pytest-7.4.0, pluggy-1.2.0 -- /home/mmh/micromamba/envs/kwant/bin/python3.11
cachedir: .pytest_cache
rootdir: /home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/kwant
collected 353 items / 352 deselected / 1 skipped / 1 selected
linalg/tests/test_mumps.py::test_schur_complement_with_dense Fatal Python error: Segmentation fault
Current thread 0x00007f5808dfc740 (most recent call first):
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/kwant/linalg/mumps.py", line 496 in schur_complement
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/kwant/linalg/tests/test_mumps.py", line 58 in _test_schur_complement_with_dense
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/kwant/linalg/tests/test_mumps.py", line 62 in test_schur_complement_with_dense
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/python.py", line 1788 in runtest
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/runner.py", line 262 in <lambda>
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/runner.py", line 222 in call_and_report
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/runner.py", line 133 in runtestprotocol
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/main.py", line 349 in pytest_runtestloop
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/main.py", line 324 in _main
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/main.py", line 270 in wrap_session
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/main.py", line 317 in pytest_cmdline_main
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/config/__init__.py", line 166 in main
File "/home/mmh/micromamba/envs/kwant/lib/python3.11/site-packages/_pytest/config/__init__.py", line 189 in console_main
File "/home/mmh/micromamba/envs/kwant/bin/pytest", line 10 in <module>
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, kwant.graph.core, tinyarray, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, kwant._system, kwant.linalg.lapack, kwant.operator, kwant.linalg._mumps, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, numpy.linalg.lapack_lite, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, kwant.graph.dijkstra, scipy._lib._uarray._uarray, scipy.fftpack.convolve, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, gmpy2.gmpy2, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy.stats._statlib, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, matplotlib._image (total: 129)
fish: Job 1, 'pytest -v -s -k test_schur_comp…' terminated by signal SIGSEGV (Address boundary error)
Looking at the kwant
feedstock, it doesn't pull in metis
directly.
$ micromamba repoquery whoneeds metis
Using local repodata...
Loaded current active prefix: "/home/mmh/micromamba/envs/kwant"
Name Version Build Depends Channel
──────────────────────────────────────────────────────────────────
mumps-seq 5.2.1 h2104b81_11 metis >=5.1.0,<5.2.0a0 conda-forge
But it gets pulled in with mumps-seq
And it looks like metis
isn't pinned https://github.com/conda-forge/mumps-feedstock/blob/main/recipe/meta.yaml#L63
So I am not sure if the segfault is a bad build, or because the newer version of metis isn't ABI compatible with the older version that mumps
builds against.
I think first we need to add a pin to mumps
and then see if mumps
either needs a rebuild, or we need to set it up so the deps pull in the version of metis
that gets pulled in is the same that mumps
used to build.
Perhaps this is related to this upstream issue: https://github.com/KarypisLab/METIS/issues/69 .
So I am not sure if the segfault is a bad build, or because the newer version of metis isn't ABI compatible with the older version that
mumps
builds against.
I am experiencing this problems while building from scratch spral in https://github.com/conda-forge/staged-recipes/pull/18148, so I do not think this is a ABI issue.
As this problem has been confirmed by several downstream packages, can't we simply roll back https://github.com/conda-forge/metis-feedstock/pull/32 and mark metis 5.1.1 as broken? Anyone is against this or prefer some other strategy? If there is consensus, I would be happy to open a PR to mark metis 5.1.1 as broken. @conda-forge/metis
The issue is that downstream packages do not have metis pinned correctly, so an incompatible version gets pulled in.
In the meantime, until the other feedstocks patch their deps, installing whatever package you want and then adding metis=5.1.0
will work.
The issue is that downstream packages do not have metis pinned correctly, so an incompatible version gets pulled in.
Are you sure that this is an ABI issue and not simply a bug in metis? I am experiencing this failure in spral that just link metis on its own, and it still segfaults. Did you try to rebuild some problematic packages and the crash was gone?
This documentation here https://conda-forge.org/docs/maintainer/updating_pkgs.html#removing-broken-packages describes how to patch the package metadata, I don't have time to do this right now, but that is the cleanest fix IMHO
Are you sure that this is an ABI issue and not simply a bug in metis? I am experiencing this failure in spral that just link metis on its own, and it still segfaults. Did you try to rebuild some problematic packages and the crash was gone?
This is a good question :smile:
Yes, I made this version build for this package https://github.com/conda-forge/dgl-feedstock/pull/5 and the package builds fine, and I am able to run the tests/tutorial locally
So that would be one data point that this package is okay. Another would be the tests that run in CI. Here is the log for that build: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=746883&view=logs&j=656edd35-690f-5c53-9ba3-09c10d0bea97&t=e5c8ab1d-8ff9-5cae-b332-e15ae582ed2d&l=836 and it passes.
So you are correct that this build could be broken, but a package that needs this version works, and the tests that ship with this version works. So I am not saying it can't be broken (I am not a metis
expert) but I think it would help to check the change log between versions and see if there is something obvious there.
In a fresh conda env:
micromamba create -n metis metis=5.1.1 compilers
micromamba activate metis
g++ -I$CONDA_PREFIX/include -L$CONDA_PREFIX/lib -lmetis metis_test.cpp -o metis_test
./metis_test
Doesn't produce a segfault on x86-64 linux (for me).
Cool, thanks for trying! I still suspect this is not just an ABI issue as it also affects spral, but for sure it is not exactly like https://github.com/KarypisLab/METIS/issues/69 .
@traversaro If you can make a small reproducer, that would be great! I hope I didn't come off as dismissive wrt this bug report, I really do want to make sure it is working and my threshold for working may be too low and there is an issue with the build.
Is this the package you are trying to build? https://github.com/ralna/spral One thing you could to do help track this down is build metis 5.1.1 yourself from source (ideally the same as the one we are using in the feedstock + the patches) and see if you get the same issue as the feedstock. It will likely be faster to iterate this locally than bouncing and building off of CI.
@ruizhi92 Can you give some more details about the segfault you observed? Thanks!
Yes, a small reproducer would be great, anyhow I was a bit hesitant on spending time on this, given that 5.1.1 is itself quite an old metis release, and perhaps we are just debugging something that has been solved in latest metis.
Yes, a small reproducer would be great, anyhow I was a bit hesitant on spending time on this, given that 5.1.1 is itself quite an old metis release, and perhaps we are just debugging something that has been solved in latest metis.
That is fair, my plan is to get a newer version out soon :tm:
@mikemhenry I installed METIS version 5.1.1 with CentOS 7.9.2009, build from source, gcc compiler, c++11.
When invoking METIS_PartGraphKway with the simplest program I found https://people.math.sc.edu/Burkardt/cpp_src/metis_test/metis_test.cpp, compiler reports "segmentation fault".
Unfortunately I have already uninstalled version 5.1.1. Can't really see much details about the segfault then, because the error comes from the dynamic library.
@traversaro If you can make a small reproducer, that would be great! I hope I didn't come off as dismissive wrt this bug report, I really do want to make sure it is working and my threshold for working may be too low and there is an issue with the build.
Is this the package you are trying to build? https://github.com/ralna/spral One thing you could to do help track this down is build metis 5.1.1 yourself from source (ideally the same as the one we are using in the feedstock + the patches) and see if you get the same issue as the feedstock. It will likely be faster to iterate this locally than bouncing and building off of CI.
@ruizhi92 Can you give some more details about the segfault you observed? Thanks!
Closed until I can get a reproducer, feel free to re-open/comment if anyone has any questions!
Possibly related comment: https://github.com/KarypisLab/METIS/issues/71#issuecomment-1696082046 .
For the spral problem with METIS >= 5.1.1, I have opened an issue upstream in https://github.com/ralna/spral/issues/133 .
Solution to issue cannot be found in the documentation.
Issue
The following script run on linux64 (ubuntu 23.04) results in a
double free or corruption (out)
:downgrading metis to 5.1.0 removes the error. Here metis is used via mumps, and the test that fails indeed checks the mumps wrapper.
Installed packages
Environment info