FEniCS / dolfinx

Next generation FEniCS problem solving environment
https://fenicsproject.org
GNU Lesser General Public License v3.0
699 stars 172 forks source link

[BUG]: Conda test fails 0.8.0 #3179

Open jhale opened 2 months ago

jhale commented 2 months ago

Summarize the issue

Conda linux64 builds of 0.8.0 set gives a test error:

FAILED unit/fem/test_fem_pipeline.py::test_dP_simplex[3-DG-tetrahedron] - AssertionError: assert 4.247720524033913e-06 < 1e-09
 +  where 4.247720524033913e-06 = <ufunc 'absolute'>(4.247720524033913e-06)
 +    where <ufunc 'absolute'> = np.abs

Additionally macOS builds are segfaulting at:

https://github.com/FEniCS/dolfinx/blob/v0.8.0/python/test/unit/fem/test_fem_pipeline.py#L294

How to reproduce the bug

Unknown, conda build system.

Minimal Example (Python)

No response

Output (Python)

No response

Version

0.8.0

DOLFINx git commit

No response

Installation

Conda build system x86-64 Linux.

Additional information

No response

minrk commented 2 months ago

For the segfault, it is in dmumps_scatter_dist_rhs_. This is not the first time I've seen a problem in dmumps_scatter_dist_rhs.

* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x5fff8db6eb50)
    frame #0: 0x000000010e50ed69 libdmumps.dylib`dmumps_scatter_dist_rhs_ + 1241
libdmumps.dylib`dmumps_scatter_dist_rhs_:
->  0x10e50ed69 <+1241>: incl   (%r9,%rdx,4)
    0x10e50ed6d <+1245>: incq   %rax
    0x10e50ed70 <+1248>: cmpq   %rax, %rsi
    0x10e50ed73 <+1251>: jne    0x10e50ed50               ; <+1216>
(lldb) bt
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x5fff8db6eb50)
  * frame #0: 0x000000010e50ed69 libdmumps.dylib`dmumps_scatter_dist_rhs_ + 1241
    frame #1: 0x000000010e507e69 libdmumps.dylib`dmumps_solve_driver_ + 72313
    frame #2: 0x000000010e5708ac libdmumps.dylib`dmumps_ + 3612
    frame #3: 0x000000010e5763f3 libdmumps.dylib`dmumps_f77_ + 7203
    frame #4: 0x000000010e56de19 libdmumps.dylib`dmumps_c + 3289
    frame #5: 0x000000010ba3b746 libpetsc.3.20.6.dylib`MatSolve_MUMPS + 742
    frame #6: 0x000000010bb7c481 libpetsc.3.20.6.dylib`MatSolve + 289
    frame #7: 0x000000010c1c7fa7 libpetsc.3.20.6.dylib`PCApply_LU + 87
    frame #8: 0x000000010c244954 libpetsc.3.20.6.dylib`PCApply + 212
    frame #9: 0x000000010c039be4 libpetsc.3.20.6.dylib`KSPSolve_PREONLY + 308
    frame #10: 0x000000010c09037e libpetsc.3.20.6.dylib`KSPSolve_Private + 1374
    frame #11: 0x000000010c08fdce libpetsc.3.20.6.dylib`KSPSolve + 30
    frame #12: 0x0000000177cc79c8 libslepc.3.20.2.dylib`STMatSolve + 120
    frame #13: 0x0000000177cc8977 libslepc.3.20.2.dylib`STApply_Generic + 87
    frame #14: 0x0000000177cc9863 libslepc.3.20.2.dylib`MatMult_STOperator + 275
    frame #15: 0x000000010b80d937 libpetsc.3.20.6.dylib`MatMult_Shell + 423
    frame #16: 0x000000010bb7084b libpetsc.3.20.6.dylib`MatMult + 235
    frame #17: 0x0000000177cc8a90 libslepc.3.20.2.dylib`STApply + 64
    frame #18: 0x0000000177e0ecbb libslepc.3.20.2.dylib`EPSGetStartVector + 219
    frame #19: 0x0000000177dded45 libslepc.3.20.2.dylib`EPSSolve_KrylovSchur_Default + 229
    frame #20: 0x0000000177e0bfd5 libslepc.3.20.2.dylib`EPSSolve + 517
    frame #21: 0x000000017a6165a8 SLEPc.cpython-310-darwin.so`__pyx_pw_8slepc4py_5SLEPc_3EPS_123solve + 40
minrk commented 2 months ago

The curl-curl segfault is a mumps bug, already reported here: https://github.com/conda-forge/mumps-feedstock/issues/110

RemDelaporteMathurin commented 2 months ago

When updating to 0.8.0 I've started noticing errors in our CI

I also noticed them in the Docker CI though so not sure it's related to this?