NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.36k stars 13.59k forks source link

`torch` + `mkl` fails with `RPATH_CHANGE could not write new RPATH` #269271

Open RuRo opened 9 months ago

RuRo commented 9 months ago

Describe the bug

torch build fails when mkl is enabled.

Steps To Reproduce

{
  inputs.nixpkgs.url = "github:nixos/nixpkgs/e4ad989506ec7d71f7302cc3067abd82730a4beb";

  outputs = { nixpkgs, ... }: let
    pkgs = import nixpkgs {
      system = "x86_64-linux";
      config.allowUnfree = true;

      overlays = [
        (final: prev: {
          blas = prev.blas.override { blasProvider = final.mkl; };
          lapack = prev.lapack.override { lapackProvider = final.mkl; };
        })
      ];
    };
  in
  {
    packages."x86_64-linux".default = pkgs.python311Packages.torch;
  };
}
-- Install configuration: "Release"
-- Set runtime path of "/build/source/torch/bin/protoc-3.13.0.0" to "$ORIGIN/../lib64"
-- Set runtime path of "/build/source/torch/lib/libc10.so" to "$ORIGIN"
CMake Error at caffe2/torch/lib/libshm/cmake_install.cmake:55 (file):
  file RPATH_CHANGE could not write new RPATH:

    $ORIGIN:/lib

  to the file:

    /build/source/torch/lib/libshm.so

  The current RUNPATH is:

    /nix/store/sxad83cvhxjxlqhqv4xcvlxr5yfbbvw4-python3.11-torch-2.0.1-lib/lib:/nix/store/vr8a599l7wrn4q0164vldran7hirj6kk-mkl-2023.1.0.46342/lib:/nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib:/nix/store/myw67gkgayf3>

  which does not contain:

    /lib:/build/source/build/lib:

  as was expected.
Call Stack (most recent call first):
  caffe2/torch/cmake_install.cmake:47 (include)
  caffe2/cmake_install.cmake:101 (include)
  cmake_install.cmake:127 (include)

FAILED: CMakeFiles/install.util 
cd /build/source/build && /nix/store/vnhl4zdy7igx9gd3q1d548vwzz15a9ma-cmake-3.27.7/bin/cmake -P cmake_install.cmake
ninja: build stopped: subcommand failed.
/nix/store/wr08yanv2bjrphhi5aai12hf2qz5kvic-stdenv-linux/setup: line 1559: pop_var_context: head of shell_variables not a function context

Expected behavior

Build succeeds.

Additional context

The overlay that enables MKL is written in accordance with the nixpkgs manual.

Notify maintainers

@teh @thoughtpolice @tscholak

Priorities

Add a :+1: reaction to issues you find important.

RuRo commented 1 month ago

Upd: still failing on latest nixos-unstable (68c9ed8bbed9dfce253cc91560bf9043297ef2fe).

RuRo commented 1 month ago

Additionally pinging the maintainers of mkl, blas and lapack: @bhipple @ttuegel

RuRo commented 3 weeks ago

FYI, I was able to successfully build torch with mkl by applying the following patch:

diff --git a/cmake/public/mkl.cmake b/cmake/public/mkl.cmake
index 68bf1b9..4963f22 100644
--- a/cmake/public/mkl.cmake
+++ b/cmake/public/mkl.cmake
@@ -15,9 +15,3 @@ foreach(MKL_LIB IN LISTS MKL_LIBRARIES)
     endif()
   endif()
 endforeach()
-
-# TODO: This is a hack, it will not pick up architecture dependent
-# MKL libraries correctly; see https://github.com/pytorch/pytorch/issues/73008
-set_property(
-  TARGET caffe2::mkl PROPERTY INTERFACE_LINK_DIRECTORIES
-  ${MKL_ROOT}/lib ${MKL_ROOT}/lib/intel64 ${MKL_ROOT}/lib/intel64_win ${MKL_ROOT}/lib/win-x64)

Not sure, if this issue could be worked around in some other way, so I opened an issue with upstream. See, https://github.com/pytorch/pytorch/issues/133823.

You may also need to apply PR #334941 that fixes the MKL build of magma.