NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.39k stars 14.34k forks source link

MVAPICH segfaults Scalapack #258599

Open sheepforce opened 1 year ago

sheepforce commented 1 year ago

Describe the bug

Scalapack segfaults if mvapich chosen as mpi provider.

Steps To Reproduce

nix build --impure -L --expr 'with import (builtins.fetchTarball "https://github.com/NixOS/nixpkgs/archive/refs/heads/nixpkgs-unstable.tar.gz") { overlays = [ (final: prev: { mpi = final.mvapich; }) ]; }; scalapack'

Expected behavior

MVAPICH2 should not segfault the standard ScaLapack test suite.

Additional context

Add any other context about the problem here.

Notify maintainers

@markuskowa

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

 - system: `"x86_64-linux"`
 - host os: `Linux 6.4.16, NixOS, 23.11 (Tapir), 23.11.20230929.f5892dd`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.17.0`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
sheepforce commented 1 year ago

I've tried multiple things; building with gcc8Stdenv and gfortran8, directly using openblasCompat instead of the blas and lapack wrappers, downgrading the MVAPICH version, playing around with threading in mvapich, ... None of this changes anything. So I've just disabled the scalapack test suite and saw how far I can get. So, for example I've built CP2K and its dependencies with MVAPICH and a disabled Scalapack test suite. And somewhat to my surprise every other test passes. CP2K works just fine, the fftw-mpi tests are fine, SIRIUS works as expected. So maybe in practice this is a non-issue? Or the error is in Scalapack rather than MVAPICH?

markuskowa commented 1 year ago

Which network type have you used for mvapich?

sheepforce commented 1 year ago

I'm testing on my workstation with ethernet. However, my main goal is to use it on our cluster with OmniPath as CP2K has memory leaks with OpenMPI for some types of calculations (e.g. constrained DFT QM/MM).