NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.45k stars 13.65k forks source link

petsc: quick tests failing #229328

Closed yl3dy closed 1 year ago

yl3dy commented 1 year ago

The current version of PETSc (3.17.4) in master and stable is broken due to make flags generation failure: for some reason MAKEFLAGS contains w as target (see hydra), no actual target with this name exists. It's fixed when building the most recent version (3.19.1), however the quick tests fail. The former behavior is observed for versions prior to 3.18.6. The quick tests also fail when building 3.19.1 with earlier Nixpkgs (specifically f096b7122ab08e93c8b052c92461ca71b80c0cc8).

make check logs:

Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process
See https://petsc.org/release/faq/
[hwloc/linux] failed to find sysfs cpu topology directory, aborting linux discovery.
[1682932900.610088] [localhost:14256:0]       tcp_iface.c:837  UCX  ERROR opendir(/sys/class/net) failed: No such file or directory
[1682932900.611292] [localhost:14256:0]       tcp_iface.c:837  UCX  ERROR opendir(/sys/class/net) failed: No such file or directory
lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
Number of SNES iterations = 2
Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
See https://petsc.org/release/faq/
[hwloc/linux] failed to find sysfs cpu topology directory, aborting linux discovery.
[1682932901.979048] [localhost:14285:0]       tcp_iface.c:837  UCX  ERROR opendir(/sys/class/net) failed: No such file or directory
[1682932901.979135] [localhost:14286:0]       tcp_iface.c:837  UCX  ERROR opendir(/sys/class/net) failed: No such file or directory
[1682932901.981042] [localhost:14285:0]       tcp_iface.c:837  UCX  ERROR opendir(/sys/class/net) failed: No such file or directory
[1682932901.981050] [localhost:14286:0]       tcp_iface.c:837  UCX  ERROR opendir(/sys/class/net) failed: No such file or directory
lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
Number of SNES iterations = 2
Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process
See https://petsc.org/release/faq/
[hwloc/linux] failed to find sysfs cpu topology directory, aborting linux discovery.
[1682932904.458819] [localhost:14392:0]       tcp_iface.c:837  UCX  ERROR opendir(/sys/class/net) failed: No such file or directory
[1682932904.461244] [localhost:14392:0]       tcp_iface.c:837  UCX  ERROR opendir(/sys/class/net) failed: No such file or directory
Number of SNES iterations =     3
Completed test examples

References:

Upstream issue: https://gitlab.com/petsc/petsc/-/issues/1386

@cburstedde

djacu commented 1 year ago

I'm also seeing these errors on x86-64-linux, aarch64-linux and aarch64-darwin.

yl3dy commented 1 year ago

Tests fail only when MPI is enabled, however it seems that the all the test scenarios require it or other specific features like CUDA (however, I'm not proficient in the test system used in PETSc). Building with either OpenMPI or MPICH produces the same errors, as does disabling p4est and BLAS/LAPACK.

I've got a strong feeling that the issue is with stale expected test results. I've ran a FEM solver utilizing PETSc 3.19.1 with doCheck = false;, and it gave the same results as with an older version 3.14.3 (however, it only utilized CG linear solver and hypre integration). Also, deal.II's tests for PETSc integration pass with the newest version.

Link to the upstream issue: https://gitlab.com/petsc/petsc/-/issues/1386

yl3dy commented 1 year ago

Seems that the PETSc's test suite fails because of messages by MPI implementation complaining about disabled networking.