NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.11k stars 14.15k forks source link

Build failure: tensorflow and grpcio #329378

Open saippua opened 3 months ago

saippua commented 3 months ago

Steps To Reproduce

Here is a flake for reproduction:

{
  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-24.05";
  };
  outputs = { self, nixpkgs, ... }@inputs:
    let
      system = "x86_64-linux";
      pkgs = import nixpkgs {
        inherit system;
      };
    in
    {
      devShell.${system} = pkgs.mkShell {
        buildInputs = with pkgs; [
          (python311.withPackages (ps: with ps; [
            tensorflow

            # Only one of these is required to get the error
            grpcio
            # torch
          ]))
        ];
      };
    };
}

nix develop to run.

Build log

error: collision between `/nix/store/rwmznwz8gc40mhjmf30x795v1xr5wqgm-python3.11-grpcio-1.62.2/lib/python3.11/site-packages/grpcio-1.62.2.dist-info/RECORD' and `/nix/store/rwlqlw34il4vdjcdvmxi28krm3hmxvwb-python3.11-grpcio-1.62.2/lib/python3.11/site-packages/grpcio-1.62.2.dist-info/RECORD'

I also tried running the flake on nixpkgs-unstable. This also fails, but the error is slightly different:

sourcing setup hook '/nix/store/3khc1xyr6gjxay09cs5zvbz1is1dn45d-make-binary-wrapper-hook/nix-support/setup-hook'
sourcing setup hook '/nix/store/jlcm4q3p5gzc9nfc8fxd8mkhcrv9zcrq-die-hook/nix-support/setup-hook'
sourcing setup hook '/nix/store/yq6n8b0mnk0qxzbs3ajsjcp8ziwqylrl-patchelf-0.15.0/nix-support/setup-hook'
sourcing setup hook '/nix/store/iks1pihvbilsh5sy8qvpd638k422w9i8-update-autotools-gnu-config-scripts-hook/nix-support/setup-hook'
sourcing setup hook '/nix/store/h9lc1dpi14z7is86ffhl3ld569138595-audit-tmpdir.sh'
sourcing setup hook '/nix/store/m54bmrhj6fqz8nds5zcj97w9s9bckc9v-compress-man-pages.sh'
sourcing setup hook '/nix/store/wgrbkkaldkrlrni33ccvm3b6vbxzb656-make-symlinks-relative.sh'
sourcing setup hook '/nix/store/5yzw0vhkyszf2d179m0qfkgxmp5wjjx4-move-docs.sh'
sourcing setup hook '/nix/store/fyaryjvghbkpfnsyw97hb3lyb37s1pd6-move-lib64.sh'
sourcing setup hook '/nix/store/kd4xwxjpjxi71jkm6ka0np72if9rm3y0-move-sbin.sh'
sourcing setup hook '/nix/store/pag6l61paj1dc9sv15l7bm5c17xn5kyk-move-systemd-user-units.sh'
sourcing setup hook '/nix/store/jivxp510zxakaaic7qkrb7v1dd2rdbw9-multiple-outputs.sh'
sourcing setup hook '/nix/store/ilaf1w22bxi6jsi45alhmvvdgy4ly3zs-patch-shebangs.sh'
sourcing setup hook '/nix/store/cickvswrvann041nqxb0rxilc46svw1n-prune-libtool-files.sh'
sourcing setup hook '/nix/store/xyff06pkhki3qy1ls77w10s0v79c9il0-reproducible-builds.sh'
sourcing setup hook '/nix/store/ngg1cv31c8c7bcm2n8ww4g06nq7s4zhm-set-source-date-epoch-to-latest.sh'
sourcing setup hook '/nix/store/gps9qrh99j7g02840wv5x78ykmz30byp-strip.sh'
error: collision between `/nix/store/gykqkh9n41y2qhc1a1nq072mvvvjljkm-python3.11-grpcio-1.64.1/lib/python3.11/site-packages/grpc/_cython/__pycache__/__init__.cpython-311.pyc' and `/nix/store/025y3kazq799ckbaaka939ig7mik5qrp-python3.11-grpcio-1.64.1/lib/python3.11/site-packages/grpc/_cython/__pycache__/__init__.cpython-311.pyc'

Additional context

I'm trying to create a shell with tensorflow and pytorch, however, I get the collision error. The same error happens in a shell with just tensorflow and grpcio, so I'm assuming that means that Tensorflow is using some modified version of grpcio which collides with the nixpkgs version.

I tried to debug this myself, but after looking at the tensorflow derivation, I couldn't find anything which looked like it could be overriding the grpcio...

However, I can have a shell with torch and grpcio and it works just fine, so I'm assuming this is a Tensorflow issue?

Notify maintainers

@abbradar @ndl @mweinelt

Metadata

 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.78, NixOS, 23.11 (Tapir), 23.11.20240220.526d051`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - channels(root): `"home-manager, nixos-23.11"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Add a :+1: reaction to issues you find important.

mweinelt commented 3 months ago

grpc for python3Packages.tensorflow gets overriden in pkgs/top-level/python-packages.nix:14946.

saippua commented 3 months ago

Ok, so grpcio is overridden because grpc and protobuf need to be overridden. grpc is overridden because of some conflict with nvcc and protobuf is overridden for some reason concerning abseil.

Is there a standard procedure for cases like this? I guess applying the tensorflow overrides to grpcio, grpc and protobuf globally within my nixpkgs would solve the collision, assuming that the overrides don't spawn other errors with other packages, but that doesn't seem like a very stable solution. Is there some way to let the colliding packages coexist within the environment?

stephen-huan commented 3 months ago

This is a duplicate of https://github.com/NixOS/nixpkgs/issues/323965 and has been a pain point for months, see https://github.com/NixOS/nixpkgs/pull/286125#issuecomment-1956370973 and https://github.com/NixOS/nixpkgs/pull/266467#discussion_r1546203656.

Replacing tensorflow with tensorflow-bin suffices. This is probably the simplest and easiest solution.

I guess applying the tensorflow overrides to grpcio, grpc and protobuf globally within my nixpkgs would solve the collision, assuming that the overrides don't spawn other errors with other packages, but that doesn't seem like a very stable solution.

If you really want to do a source build, then the simplest way is to override torch with the tensorflow overrides, which has the advantage of only modifying torch, but has the disadvantage of manually finding the conflicting dependencies.

https://github.com/NixOS/nixpkgs/blob/68c9ed8bbed9dfce253cc91560bf9043297ef2fe/pkgs/top-level/python-packages.nix#L15192-L15236

A flake is given below. Note that this requires recompiling torch (but not tensorflow) which took ~50 minutes on my laptop.

{
  inputs.nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";

  outputs = { self, nixpkgs }:
    let
      system = "x86_64-linux";
      pkgs = nixpkgs.legacyPackages.${system};
      # see tensorflow-build in pkgs/top-level/python-packages.nix
      self = pkgs.python311Packages;
      compat = rec {
        protobufTF = pkgs.protobuf_21.override {
          abseil-cpp = pkgs.abseil-cpp_202301;
        };
        grpcTF = (pkgs.grpc.overrideAttrs (
          oldAttrs: rec {
            version = "1.27.3";
            src = pkgs.fetchFromGitHub {
              owner = "grpc";
              repo = "grpc";
              rev = "v${version}";
              hash = "sha256-PpiOT4ZJe1uMp5j+ReQulC9jpT0xoR2sAl6vRYKA0AA=";
              fetchSubmodules = true;
            };
            patches = [ ];
            postPatch = ''
              sed -i "s/-std=c++11/-std=c++17/" CMakeLists.txt
              echo "set(CMAKE_CXX_STANDARD 17)" >> CMakeLists.txt
            '';
          }
        )
        ).override {
          protobuf = protobufTF;
        };
        protobuf-pythonTF = self.protobuf.override {
          protobuf = protobufTF;
        };
        grpcioTF = self.grpcio.override {
          protobuf = protobufTF;
          grpc = grpcTF;
        };
        tensorboard-plugin-profileTF = self.tensorboard-plugin-profile.override {
          protobuf = protobuf-pythonTF;
        };
        tensorboardTF = self.tensorboard.override {
          grpcio = grpcioTF;
          protobuf = protobuf-pythonTF;
          tensorboard-plugin-profile = tensorboard-plugin-profileTF;
        };
      };
      torch = self.torch.override {
        tensorboard = compat.tensorboardTF;
        protobuf = compat.protobuf-pythonTF;
      };
    in
    {
      devShell.${system} = pkgs.mkShell {
        packages = [
          (pkgs.python311.withPackages (ps: [
            ps.tensorflow
            torch
          ]))
        ];
      };
    };
}

A more sophisticated approach uses pkgs.python311Packages.overrideScope to create an alternative universe with the tensorflow overrides (this is what I do to package gpjax in maipkgs for example). I reverted the modifications to google-auth to maintain exactly the same behavior as the previous flake; although google-auth only uses grpcio in nativeCheckInputs, by default nix is input-addressed rather than content-addressed, so this changes the hash and therefore causes rebuilds. The advantage of this flake is that tensorflow's overrides are automatically propagated to every python package while being localized to python3Packages', i.e., much more fine-grain than a nixpkgs override.

{
  inputs.nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";

  outputs = { self, nixpkgs }:
    let
      system = "x86_64-linux";
      pkgs = nixpkgs.legacyPackages.${system};
      # see tensorflow-build in pkgs/top-level/python-packages.nix
      compat = rec {
        protobuf = pkgs.protobuf_21.override {
          abseil-cpp = pkgs.abseil-cpp_202301;
        };
        grpc = (pkgs.grpc.overrideAttrs (
          oldAttrs: rec {
            version = "1.27.3";
            src = pkgs.fetchFromGitHub {
              owner = "grpc";
              repo = "grpc";
              rev = "v${version}";
              hash = "sha256-PpiOT4ZJe1uMp5j+ReQulC9jpT0xoR2sAl6vRYKA0AA=";
              fetchSubmodules = true;
            };
            patches = [ ];
            postPatch = ''
              sed -i "s/-std=c++11/-std=c++17/" CMakeLists.txt
              echo "set(CMAKE_CXX_STANDARD 17)" >> CMakeLists.txt
            '';
          }
        )
        ).override {
          inherit protobuf;
        };
      };
      python3Packages' = pkgs.python311Packages.overrideScope (final: prev: {
        protobuf = prev.protobuf.override {
          inherit (compat) protobuf;
        };
        grpcio = prev.grpcio.override {
          inherit (compat) protobuf grpc;
        };
        # probably unnecessary
        google-auth = prev.google-auth.override {
          grpcio = prev.grpcio.override {
            inherit (prev) protobuf;
          };
        };
      });
      python' = python3Packages'.python.override {
        packageOverrides = final: prev: python3Packages';
        self = python';
      };
    in
    {
      devShell.${system} = pkgs.mkShell {
        packages = [
          (python'.withPackages (ps: with ps; [
            tensorflow
            torch
          ]))
        ];
      };
    };
}

Unfortunately both flakes require recompiling torch which is quite expensive. In terms of a nixpkgs PR, we should probably export tensorboardTF as what is currently called tensorboard (right now tensorflow and tensorboard conflict, which is quite surprising). That itself is not sufficient, as e.g. orbax-checkpoint conflicts through protobuf. I can only think of exposing the *TF packages in compat at the top level and manually overwriting machine learning packages that are commonly used with tensorflow, but this is obviously not ideal.

Is there some way to let the colliding packages coexist within the environment?

Not as far as I know; apparently this is an inherent limitation of python. The manual says

We also cannot package multiple versions of the same package since this may cause conflicts in PYTHONPATH.