FEniCS / dolfinx

Next generation FEniCS problem solving environment
https://fenicsproject.org
GNU Lesser General Public License v3.0
776 stars 181 forks source link

[discussion]: failure and non-failure of demos using Adios2 on big-endian systems (s390x) #3072

Closed drew-parsons closed 6 months ago

drew-parsons commented 8 months ago

Summarize the issue

Adios2 has a difficult relationship with big-endian systems (such as s390x, ppc64). It builds fine, but upstream notes that it is not really tested (though it does pass "most" tests). It builds fine with a default build configuration, but then adios4dolfinx tests cause adios2 to confess it should be build with -DADIOS2_USE_Endian_Reverse=ON, which is fair enough.

After building adios2 on s390x with -DADIOS2_USE_Endian_Reverse=ON, some of the dolfinx demos using VTXWriter fail. This is not surprising. As just noted, adios2 has not been tested upstream for big-endian support. What is surprising is that some demos do not fail.

The existence of failing and non-failing demos suggests it might be possible to identify (and fix) the trigger for failure.

In debian build 1:0.7.3-5 I've patched the demos to skip the failing demos on big-endian systems. I give sample output below documenting the segfault without that patch, together with the log of passing demos, extracted from https://ci.debian.net/data/autopkgtest/unstable/s390x/f/fenics-dolfinx/43107848/log.gz

The python demos that fails using VTXWriter is

The python demos that apparently pass using VTXWriter are

Another strange observation is that C++ demo_poisson fails in a similar way, yet the corresponding python demo demo_poisson.py passes without segfault.

The python unit tests io/test_adios2.py all pass.

How to reproduce the bug

  1. Build adios2 on a big-endian system (s390x or ppc64) configuring with -DADIOS2_USE_Endian_Reverse=ON
  2. Build dolfinx against adios2
  3. Run demos
    cd python/demos
    python3 -m pytest -v test.py

Minimal Example (Python)

No response

Output (Python)

https://ci.debian.net/data/autopkgtest/unstable/s390x/f/fenics-dolfinx/43107848/log.gz
21510s autopkgtest [05:38:56]: test test-dolfinx-python-demos: [-----------------------
21510s == running python demos ==
21510s === python demo test (serial) ===
21510s ============================= test session starts ==============================
21510s platform linux -- Python 3.11.8, pytest-7.4.4, pluggy-1.4.0 -- /usr/bin/python3
21510s cachedir: .pytest_cache
21510s rootdir: /tmp/autopkgtest-lxc.253yol7b/downtmp/build.7jP/src/python/demo
21510s configfile: pytest.ini
21510s collecting ... collected 54 items / 12 deselected / 42 selected
21510s 
21513s test.py::test_demos[path0-demo_elasticity.py] PASSED                     [  2%]
21517s test.py::test_demos[path1-demo_types.py] PASSED                          [  4%]
21517s test.py::test_demos[path2-demo_poisson.py] PASSED                        [  7%]
21518s test.py::test_demos[path3-demo_interpolation-io.py] FAILED               [  9%]
21518s test.py::test_demos[path4-test.py] PASSED                                [ 11%]
21519s test.py::test_demos[path5-demo_mixed-poisson.py] PASSED                  [ 14%]
21522s test.py::test_demos[path7-demo_stokes.py] PASSED                         [ 16%]
21522s test.py::test_demos[path8-conftest.py] PASSED                            [ 19%]
21523s test.py::test_demos[path9-demo_pyvista.py] PASSED                        [ 21%]
21525s test.py::test_demos[path11-demo_helmholtz.py] PASSED                     [ 23%]
21547s test.py::test_demos[path12-demo_tnt-elements.py] PASSED                  [ 26%]
21553s test.py::test_demos[path15-demo_navier-stokes.py] PASSED                 [ 28%]
21554s test.py::test_demos[path16-demo_biharmonic.py] PASSED                    [ 30%]
21555s test.py::test_demos[path17-mesh_sphere_axis.py] PASSED                   [ 33%]
21555s test.py::test_demos[path19-mesh_wire_pml.py] PASSED                      [ 35%]
21555s test.py::test_demos[path20-efficiencies_pml_demo.py] PASSED              [ 38%]
21559s test.py::test_demos[path22-demo_half_loaded_waveguide.py] PASSED         [ 40%]
21559s test.py::test_demos[path23-analytical_modes.py] PASSED                   [ 42%]
21559s test.py::test_demos[path24-mesh_wire.py] PASSED                          [ 45%]
21560s test.py::test_demos[path25-demo_scattering_boundary_conditions.py] PASSED [ 47%]
21560s test.py::test_demos[path26-analytical_efficiencies_wire.py] PASSED       [ 50%]
21561s test.py::test_demos_mpi[path0-demo_elasticity.py] PASSED                 [ 52%]
21562s test.py::test_demos_mpi[path1-demo_types.py] PASSED                      [ 54%]
21563s test.py::test_demos_mpi[path2-demo_poisson.py] PASSED                    [ 57%]
21565s test.py::test_demos_mpi[path3-demo_interpolation-io.py] FAILED           [ 59%]
21565s test.py::test_demos_mpi[path4-test.py] PASSED                            [ 61%]
21566s test.py::test_demos_mpi[path5-demo_mixed-poisson.py] PASSED              [ 64%]
21567s test.py::test_demos_mpi[path7-demo_stokes.py] PASSED                     [ 66%]
21568s test.py::test_demos_mpi[path8-conftest.py] PASSED                        [ 69%]
21568s test.py::test_demos_mpi[path9-demo_pyvista.py] PASSED                    [ 71%]
21569s test.py::test_demos_mpi[path11-demo_helmholtz.py] PASSED                 [ 73%]
21577s test.py::test_demos_mpi[path12-demo_tnt-elements.py] PASSED              [ 76%]
21579s test.py::test_demos_mpi[path15-demo_navier-stokes.py] PASSED             [ 78%]
21580s test.py::test_demos_mpi[path16-demo_biharmonic.py] PASSED                [ 80%]
21580s test.py::test_demos_mpi[path17-mesh_sphere_axis.py] PASSED               [ 83%]
21581s test.py::test_demos_mpi[path19-mesh_wire_pml.py] PASSED                  [ 85%]
21581s test.py::test_demos_mpi[path20-efficiencies_pml_demo.py] PASSED          [ 88%]
21584s test.py::test_demos_mpi[path22-demo_half_loaded_waveguide.py] PASSED     [ 90%]
21584s test.py::test_demos_mpi[path23-analytical_modes.py] PASSED               [ 92%]
21585s test.py::test_demos_mpi[path24-mesh_wire.py] PASSED                      [ 95%]
21585s test.py::test_demos_mpi[path25-demo_scattering_boundary_conditions.py] PASSED [ 97%]
21586s test.py::test_demos_mpi[path26-analytical_efficiencies_wire.py] PASSED   [100%]
21586s 
21586s =================================== FAILURES ===================================
21586s __________________ test_demos[path3-demo_interpolation-io.py] __________________
21586s 
21586s path = PosixPath('/tmp/autopkgtest-lxc.253yol7b/downtmp/build.7jP/src/python/demo')
21586s name = 'demo_interpolation-io.py'
21586s 
21586s     @pytest.mark.serial
21586s     @pytest.mark.parametrize("path,name", demos)
21586s     def test_demos(path, name):
21586s >       ret = subprocess.run([sys.executable, name], cwd=str(path), check=True)
21586s 
21586s test.py:26: 
21586s _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
21586s 
21586s input = None, capture_output = False, timeout = None, check = True
21586s popenargs = (['/usr/bin/python3', 'demo_interpolation-io.py'],)
21586s kwargs = {'cwd': '/tmp/autopkgtest-lxc.253yol7b/downtmp/build.7jP/src/python/demo'}
21586s process = <Popen: returncode: -11 args: ['/usr/bin/python3', 'demo_interpolation-io.py']>
21586s stdout = None, stderr = None, retcode = -11
21586s 
21586s     def run(*popenargs,
21586s             input=None, capture_output=False, timeout=None, check=False, **kwargs):
21586s         """Run command with arguments and return a CompletedProcess instance.
21586s     
21586s         The returned instance will have attributes args, returncode, stdout and
21586s         stderr. By default, stdout and stderr are not captured, and those attributes
21586s         will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
21586s         or pass capture_output=True to capture both.
21586s     
21586s         If check is True and the exit code was non-zero, it raises a
21586s         CalledProcessError. The CalledProcessError object will have the return code
21586s         in the returncode attribute, and output & stderr attributes if those streams
21586s         were captured.
21586s     
21586s         If timeout is given, and the process takes too long, a TimeoutExpired
21586s         exception will be raised.
21586s     
21586s         There is an optional argument "input", allowing you to
21586s         pass bytes or a string to the subprocess's stdin.  If you use this argument
21586s         you may not also use the Popen constructor's "stdin" argument, as
21586s         it will be used internally.
21586s     
21586s         By default, all communication is in bytes, and therefore any "input" should
21586s         be bytes, and the stdout and stderr will be bytes. If in text mode, any
21586s         "input" should be a string, and stdout and stderr will be strings decoded
21586s         according to locale encoding, or by "encoding" if set. Text mode is
21586s         triggered by setting any of text, encoding, errors or universal_newlines.
21586s     
21586s         The other arguments are the same as for the Popen constructor.
21586s         """
21586s         if input is not None:
21586s             if kwargs.get('stdin') is not None:
21586s                 raise ValueError('stdin and input arguments may not both be used.')
21586s             kwargs['stdin'] = PIPE
21586s     
21586s         if capture_output:
21586s             if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
21586s                 raise ValueError('stdout and stderr arguments may not be used '
21586s                                  'with capture_output.')
21586s             kwargs['stdout'] = PIPE
21586s             kwargs['stderr'] = PIPE
21586s     
21586s         with Popen(*popenargs, **kwargs) as process:
21586s             try:
21586s                 stdout, stderr = process.communicate(input, timeout=timeout)
21586s             except TimeoutExpired as exc:
21586s                 process.kill()
21586s                 if _mswindows:
21586s                     # Windows accumulates the output in a single blocking
21586s                     # read() call run on child threads, with the timeout
21586s                     # being done in a join() on those threads.  communicate()
21586s                     # _after_ kill() is required to collect that and add it
21586s                     # to the exception.
21586s                     exc.stdout, exc.stderr = process.communicate()
21586s                 else:
21586s                     # POSIX _communicate already populated the output so
21586s                     # far into the TimeoutExpired exception.
21586s                     process.wait()
21586s                 raise
21586s             except:  # Including KeyboardInterrupt, communicate handled that.
21586s                 process.kill()
21586s                 # We don't call process.wait() as .__exit__ does that for us.
21586s                 raise
21586s             retcode = process.poll()
21586s             if check and retcode:
21586s >               raise CalledProcessError(retcode, process.args,
21586s                                          output=stdout, stderr=stderr)
21586s E               subprocess.CalledProcessError: Command '['/usr/bin/python3', 'demo_interpolation-io.py']' died with <Signals.SIGSEGV: 11>.
21586s 
21586s /usr/lib/python3.11/subprocess.py:571: CalledProcessError
21586s ----------------------------- Captured stderr call -----------------------------
21586s [0]PETSC ERROR: ------------------------------------------------------------------------
21586s [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
21586s [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
21586s [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
21586s [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
21586s [0]PETSC ERROR: to get more information on the crash.
21586s [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
21586s ________________ test_demos_mpi[path3-demo_interpolation-io.py] ________________
21586s 
21586s num_proc = 1, mpiexec = 'mpirun'
21586s path = PosixPath('/tmp/autopkgtest-lxc.253yol7b/downtmp/build.7jP/src/python/demo')
21586s name = 'demo_interpolation-io.py'
21586s 
21586s     @pytest.mark.mpi
21586s     @pytest.mark.parametrize("path,name", demos)
21586s     def test_demos_mpi(num_proc, mpiexec, path, name):
21586s         cmd = [mpiexec, "-np", str(num_proc), sys.executable, name]
21586s         print(cmd)
21586s >       ret = subprocess.run(cmd, cwd=str(path), check=True)
21586s 
21586s test.py:35: 
21586s _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
21586s 
21586s input = None, capture_output = False, timeout = None, check = True
21586s popenargs = (['mpirun', '-np', '1', '/usr/bin/python3', 'demo_interpolation-io.py'],)
21586s kwargs = {'cwd': '/tmp/autopkgtest-lxc.253yol7b/downtmp/build.7jP/src/python/demo'}
21586s process = <Popen: returncode: 134 args: ['mpirun', '-np', '1', '/usr/bin/python3', 'de...>
21586s stdout = None, stderr = None, retcode = 134
21586s 
21586s     def run(*popenargs,
21586s             input=None, capture_output=False, timeout=None, check=False, **kwargs):
21586s         """Run command with arguments and return a CompletedProcess instance.
21586s     
21586s         The returned instance will have attributes args, returncode, stdout and
21586s         stderr. By default, stdout and stderr are not captured, and those attributes
21586s         will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
21586s         or pass capture_output=True to capture both.
21586s     
21586s         If check is True and the exit code was non-zero, it raises a
21586s         CalledProcessError. The CalledProcessError object will have the return code
21586s         in the returncode attribute, and output & stderr attributes if those streams
21586s         were captured.
21586s     
21586s         If timeout is given, and the process takes too long, a TimeoutExpired
21586s         exception will be raised.
21586s     
21586s         There is an optional argument "input", allowing you to
21586s         pass bytes or a string to the subprocess's stdin.  If you use this argument
21586s         you may not also use the Popen constructor's "stdin" argument, as
21586s         it will be used internally.
21586s     
21586s         By default, all communication is in bytes, and therefore any "input" should
21586s         be bytes, and the stdout and stderr will be bytes. If in text mode, any
21586s         "input" should be a string, and stdout and stderr will be strings decoded
21586s         according to locale encoding, or by "encoding" if set. Text mode is
21586s         triggered by setting any of text, encoding, errors or universal_newlines.
21586s     
21586s         The other arguments are the same as for the Popen constructor.
21586s         """
21586s         if input is not None:
21586s             if kwargs.get('stdin') is not None:
21586s                 raise ValueError('stdin and input arguments may not both be used.')
21586s             kwargs['stdin'] = PIPE
21586s     
21586s         if capture_output:
21586s             if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
21586s                 raise ValueError('stdout and stderr arguments may not be used '
21586s                                  'with capture_output.')
21586s             kwargs['stdout'] = PIPE
21586s             kwargs['stderr'] = PIPE
21586s     
21586s         with Popen(*popenargs, **kwargs) as process:
21586s             try:
21586s                 stdout, stderr = process.communicate(input, timeout=timeout)
21586s             except TimeoutExpired as exc:
21586s                 process.kill()
21586s                 if _mswindows:
21586s                     # Windows accumulates the output in a single blocking
21586s                     # read() call run on child threads, with the timeout
21586s                     # being done in a join() on those threads.  communicate()
21586s                     # _after_ kill() is required to collect that and add it
21586s                     # to the exception.
21586s                     exc.stdout, exc.stderr = process.communicate()
21586s                 else:
21586s                     # POSIX _communicate already populated the output so
21586s                     # far into the TimeoutExpired exception.
21586s                     process.wait()
21586s                 raise
21586s             except:  # Including KeyboardInterrupt, communicate handled that.
21586s                 process.kill()
21586s                 # We don't call process.wait() as .__exit__ does that for us.
21586s                 raise
21586s             retcode = process.poll()
21586s             if check and retcode:
21586s >               raise CalledProcessError(retcode, process.args,
21586s                                          output=stdout, stderr=stderr)
21586s E               subprocess.CalledProcessError: Command '['mpirun', '-np', '1', '/usr/bin/python3', 'demo_interpolation-io.py']' returned non-zero exit status 134.
21586s 
21586s /usr/lib/python3.11/subprocess.py:571: CalledProcessError
21586s ----------------------------- Captured stdout call -----------------------------
21586s ['mpirun', '-np', '1', '/usr/bin/python3', 'demo_interpolation-io.py']
21586s ----------------------------- Captured stderr call -----------------------------
21586s free(): invalid pointer
21586s [ci-052-4ceea23b:31807] *** Process received signal ***
21586s [ci-052-4ceea23b:31807] Signal: Aborted (6)
21586s [ci-052-4ceea23b:31807] Signal code:  (-6)
21586s [ci-052-4ceea23b:31807] [ 0] linux-vdso64.so.1(__kernel_rt_sigreturn+0x0)[0x3ffdb7fe490]
21586s [ci-052-4ceea23b:31807] [ 1] /lib/s390x-linux-gnu/libc.so.6(+0x97646)[0x3ff87097646]
21586s [ci-052-4ceea23b:31807] [ 2] /lib/s390x-linux-gnu/libc.so.6(gsignal+0x20)[0x3ff87047e88]
21586s [ci-052-4ceea23b:31807] [ 3] /lib/s390x-linux-gnu/libc.so.6(abort+0x110)[0x3ff8702ac88]
21586s [ci-052-4ceea23b:31807] [ 4] /lib/s390x-linux-gnu/libc.so.6(+0x89dde)[0x3ff87089dde]
21586s [ci-052-4ceea23b:31807] [ 5] /lib/s390x-linux-gnu/libc.so.6(+0xa18ec)[0x3ff870a18ec]
21586s [ci-052-4ceea23b:31807] [ 6] /lib/s390x-linux-gnu/libc.so.6(+0xa38d8)[0x3ff870a38d8]
21586s [ci-052-4ceea23b:31807] [ 7] /lib/s390x-linux-gnu/libc.so.6(__libc_free+0xde)[0x3ff870a6846]
21586s [ci-052-4ceea23b:31807] [ 8] /lib/s390x-linux-gnu/libadios2_mpi_core.so.2(_ZNSt10_HashtableINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_St6vectorISt5tupleIJmmEESaISA_EEESaISD_ENSt8__detail10_Select1stESt8equal_toIS5_ESt4hashIS5_ENSF_18_Mod_range_hashingENSF_20_Default_ranged_hashENSF_20_Prime_rehash_policyENSF_17_Hashtable_traitsILb1ELb0ELb1EEEE5clearEv+0x56)[0x3ff6e5911fe]
21586s [ci-052-4ceea23b:31807] [ 9] /lib/s390x-linux-gnu/libadios2_mpi_core.so.2(_ZN6adios26format13BP4Serializer34AggregateCollectiveMetadataIndicesERKNS_6helper4CommERNS0_9BufferSTLE+0x1c7c)[0x3ff6e587d64]
21586s [ci-052-4ceea23b:31807] [10] /lib/s390x-linux-gnu/libadios2_mpi_core.so.2(_ZN6adios26format13BP4Serializer27AggregateCollectiveMetadataERKNS_6helper4CommERNS0_9BufferSTLEb+0xee)[0x3ff6e588466]
21586s [ci-052-4ceea23b:31807] [11] /lib/s390x-linux-gnu/libadios2_mpi_core.so.2(_ZN6adios24core6engine9BP4Writer27WriteCollectiveMetadataFileEb+0xac)[0x3ff6e469f3c]
21586s [ci-052-4ceea23b:31807] [12] /lib/s390x-linux-gnu/libadios2_mpi_core.so.2(_ZN6adios24core6engine9BP4Writer5FlushEi+0x98)[0x3ff6e46a608]
21586s [ci-052-4ceea23b:31807] [13] /lib/s390x-linux-gnu/libadios2_mpi_core.so.2(_ZN6adios24core6engine9BP4Writer7EndStepEv+0x2e6)[0x3ff6e46a9b6]
21586s [ci-052-4ceea23b:31807] [14] /usr/lib/petsc/lib/python3/dist-packages/dolfinx/cpp.cpython-311-s390x-linux-gnu.so(+0x1c6b24)[0x3ff6f7c6b24]
21586s [ci-052-4ceea23b:31807] [15] /usr/lib/petsc/lib/python3/dist-packages/dolfinx/cpp.cpython-311-s390x-linux-gnu.so(+0x1e8610)[0x3ff6f7e8610]
21586s [ci-052-4ceea23b:31807] [16] /usr/lib/petsc/lib/python3/dist-packages/dolfinx/cpp.cpython-311-s390x-linux-gnu.so(+0x37a34)[0x3ff6f637a34]
21586s [ci-052-4ceea23b:31807] [17] /usr/bin/python3[0x10deb96]
21586s [ci-052-4ceea23b:31807] [18] /usr/bin/python3(_PyObject_MakeTpCall+0x254)[0x10a36c4]
21586s [ci-052-4ceea23b:31807] [19] /usr/bin/python3(PyObject_Vectorcall+0x78)[0x10cf578]
21586s [ci-052-4ceea23b:31807] [20] /usr/bin/python3(_PyEval_EvalFrameDefault+0xaa0)[0x10bbec0]
21586s [ci-052-4ceea23b:31807] [21] /usr/bin/python3(PyEval_EvalCode+0xcc)[0x11d27bc]
21586s [ci-052-4ceea23b:31807] [22] /usr/bin/python3[0x11f8e96]
21586s [ci-052-4ceea23b:31807] [23] /usr/bin/python3[0x11f3dcc]
21586s [ci-052-4ceea23b:31807] [24] /usr/bin/python3[0x120f6ac]
21586s [ci-052-4ceea23b:31807] [25] /usr/bin/python3(_PyRun_SimpleFileObject+0x1c2)[0x120f252]
21586s [ci-052-4ceea23b:31807] [26] /usr/bin/python3(_PyRun_AnyFileObject+0x4e)[0x120ef6e]
21586s [ci-052-4ceea23b:31807] [27] /usr/bin/python3(Py_RunMain+0x32c)[0x120c87c]
21586s [ci-052-4ceea23b:31807] [28] /usr/bin/python3(Py_BytesMain+0x2e)[0x11c159e]
21586s [ci-052-4ceea23b:31807] [29] /lib/s390x-linux-gnu/libc.so.6(+0x2af2a)[0x3ff8702af2a]
21586s [ci-052-4ceea23b:31807] *** End of error message ***
21586s --------------------------------------------------------------------------
21586s Primary job  terminated normally, but 1 process returned
21586s a non-zero exit code. Per user-direction, the job has been aborted.
21586s --------------------------------------------------------------------------
21586s --------------------------------------------------------------------------
21586s mpirun noticed that process rank 0 with PID 0 on node ci-052-4ceea23b exited on signal 6 (Aborted).
21586s --------------------------------------------------------------------------
21586s ============================= slowest 20 durations =============================
21586s 21.98s call     test.py::test_demos[path12-demo_tnt-elements.py]
21586s 8.38s call     test.py::test_demos_mpi[path12-demo_tnt-elements.py]
21586s 6.12s call     test.py::test_demos[path15-demo_navier-stokes.py]
21586s 3.96s call     test.py::test_demos[path22-demo_half_loaded_waveguide.py]
21586s 3.67s call     test.py::test_demos[path1-demo_types.py]
21586s 3.15s call     test.py::test_demos_mpi[path22-demo_half_loaded_waveguide.py]
21586s 2.94s call     test.py::test_demos[path7-demo_stokes.py]
21586s 2.69s call     test.py::test_demos[path0-demo_elasticity.py]
21586s 2.48s call     test.py::test_demos_mpi[path3-demo_interpolation-io.py]
21586s 2.10s call     test.py::test_demos_mpi[path15-demo_navier-stokes.py]
21586s 1.83s call     test.py::test_demos[path11-demo_helmholtz.py]
21586s 1.70s call     test.py::test_demos[path16-demo_biharmonic.py]
21586s 1.37s call     test.py::test_demos_mpi[path7-demo_stokes.py]
21586s 1.27s call     test.py::test_demos[path5-demo_mixed-poisson.py]
21586s 1.10s call     test.py::test_demos_mpi[path0-demo_elasticity.py]
21586s 0.91s call     test.py::test_demos[path25-demo_scattering_boundary_conditions.py]
21586s 0.81s call     test.py::test_demos_mpi[path25-demo_scattering_boundary_conditions.py]
21586s 0.81s call     test.py::test_demos_mpi[path11-demo_helmholtz.py]
21586s 0.78s call     test.py::test_demos_mpi[path1-demo_types.py]
21586s 0.71s call     test.py::test_demos_mpi[path16-demo_biharmonic.py]
21586s =========================== short test summary info ============================
21586s FAILED test.py::test_demos[path3-demo_interpolation-io.py] - subprocess.Calle...
21586s FAILED test.py::test_demos_mpi[path3-demo_interpolation-io.py] - subprocess.C...
21586s ============ 2 failed, 40 passed, 12 deselected in 75.46s (0:01:15) ============
21586s autopkgtest [05:40:12]: test test-dolfinx-python-demos: -----------------------]
21586s test-dolfinx-python-demos FAIL non-zero exit status 1

Version

0.7.3

DOLFINx git commit

No response

Installation

Debian packages.

Additional information

Adios2 2.9.2+dfsg1-13 with patches supporting BP5.

jhale commented 8 months ago

I feel like this isn't properly solvable without effort upstream in ADIOS2 to test on big endian architectures. It's always going to be unreliable without testing there.

Perhaps we should disable building against ADIOS2 on big endian systems? Detecting big endianness is easy with C++20's https://en.cppreference.com/w/cpp/types/endian, or with CMake https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_BYTE_ORDER.html.

drew-parsons commented 8 months ago

I disabled the affected tests that way with https://salsa.debian.org/science-team/fenics/fenics-dolfinx/-/blob/master/debian/patches/big_endian_skip_adios2_tests.patch?ref_type=heads There's also sys.byteorder in Python. That's enough to get the test suite passing on s390x.

I'd suggest doing it that way, disabling at the level of the tests rather than the library itself, and document it with a comment somewhere to acknowledge it's not supported. That way anyone actually using big-endian can choose to run VTXWriter knowing that it's not expected to work, and can generate and take a closer look at backtraces. I figure anyone using big-endian will know they have a problematic architecture in general.

garth-wells commented 6 months ago

Closing since it's an upstream issue.