NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.06k stars 14.1k forks source link

rocmPackages: drop symlinks & split outputs #276846

Open SomeoneSerge opened 10 months ago

SomeoneSerge commented 10 months ago

...if splittable. CC https://github.com/NixOS/nixpkgs/issues/197885

Issue description

❯ nix eval nixpkgs#rocmPackages.rocblas.outputs
[ "out" ]
❯ nix build --refresh github:philiptaron/llama.cpp/nix#rocm --print-out-paths | xargs nix path-info --human-readable --closure-size --recursive | grep G'$'
/nix/store/8fpf7jmnmyf96bjabx4r8rj3vki8z7pj-rocm-llvm-clang-unwrapped-5.7.1        1.7G
/nix/store/8xa2qq8xkd4qjmlary2j0wmblmkfb5q0-rocm-llvm-llvm-5.7.1                   1.5G
/nix/store/mslhw90n1yi8xwmvz60zh408g15hmica-rocm-llvm-clang-5.7.1                  3.8G
/nix/store/yc0v7l0bka0hcyy82vkqhaxmj5c91lz8-rocm-runtime-5.7.1                     3.8G
/nix/store/2bhwj31y2fa4kqc3sp4bs1b1nd4d8qic-rocminfo-5.7.1                         3.9G
/nix/store/wcmixj44ihf7mjpf5wn77y9gr1p9zzxv-rocm-llvm-binutils-5.7.1               2.0G
/nix/store/ivdnfrb7ymzbcv12a52jnz9sdf6fvpdp-rocm-llvm-binutils-wrapper-5.7.1       2.0G
/nix/store/6xj537dwafp618f084hkl4gvlf5sfqqa-rocm-llvm-clang-wrapper-5.7.1          4.0G
/nix/store/68xb3gyp792qyhpg6wwpamh9w08iwdkr-clr-5.7.1                              4.4G
/nix/store/pk2gwl2bqr3yqwb8kb524ixbdbkms7dp-rocblas-tensile-gfx90-5.7.1            1.8G
/nix/store/9fm8cfjwd9nhpc06bz4fdgmc4gwdi2dq-rocblas-5.7.1                          7.9G
/nix/store/gblj9yaj27f50h5fh2vyr774ygxvd3qf-rocsparse-5.7.1                        5.4G
/nix/store/8q2vbagp8j602490l2v3rwn435l71k03-rocsolver-5.7.1                       10.4G
/nix/store/fdbz6r7h9kxzypyfz9h1mq93qwsrq9jx-hipblas-5.7.1                         10.4G
/nix/store/1b6qcr7l9p0yx4mji8c6pbjcybz4fp88-llama-cpp-rocm-0.0.0                  10.6G
❯ # Baseline:
❯ nix build --refresh github:philiptaron/llama.cpp/nix#{cuda,default} --print-out-paths | xargs nix path-info --human-readable --closure-size 
/nix/store/61xjz6l35r1c6xp4v5py8iaia18bs5xy-llama-cpp-cuda-0.0.0         985.6M
/nix/store/ndx1xc79in3kmf65r4h1a28j3i72yv8q-llama-cpp-blas-0.0.0         340.1M

image

@NixOS/rocm-maintainers (and @NixOS/cuda-maintainers as potentially interested)

Technical details

❯ nix flake metadata github:philiptaron/llama.cpp/nix
Resolved URL:  github:philiptaron/llama.cpp/nix
Locked URL:    github:philiptaron/llama.cpp/4522c47a2282a595344ba8dcb85222910b2ffc4f
Description:   Port of Facebook's LLaMA model in C/C++
Path:          /nix/store/rylr1mlncn5jrziws9wm641fp9mc5dsf-source
Revision:      4522c47a2282a595344ba8dcb85222910b2ffc4f
Last modified: 2023-12-26 04:07:02
Inputs:
├───flake-parts: github:hercules-ci/flake-parts/34fed993f1674c8d06d58b37ce1e0fe5eebcb9f5 (2023-12-01 23:39:28)
│   └───nixpkgs-lib: github:NixOS/nixpkgs/e92039b55bcd58469325ded85d4f58dd5a4eaf58?dir=lib (2023-11-29 10:33:01)
└───nixpkgs: github:NixOS/nixpkgs/75dd68c36f458c6593c5bbb48abfd3e59bfed380 (2023-12-26 03:05:57)
SomeoneSerge commented 10 months ago

image

(I haven't looked up what those are yet)

SomeoneSerge commented 10 months ago

Symlinks are the main reasons the runtime closure is so huge (CC https://github.com/NixOS/nixpkgs/pull/260299): https://github.com/NixOS/nixpkgs/blob/4848569afdf2bcfeba902a25c799522d2905f73d/pkgs/development/rocm-modules/5/rocblas/default.nix#L152-L154

We should work out some propagatedBuildInputs-based solution and ensure that tools like CMake handle the splayed layouts well.

I wonder if we could start by just moving the symlinks into a separate, say, dev output and hoping that the dependency isn't retained at runtime

Madouura commented 10 months ago

Pretty sure we need everything that's in the (split) rocblas closure. I haven't really found a better solution than what we have now.

Madouura commented 10 months ago

Symlinks are the main reasons the runtime closure is so huge (CC #260299):

https://github.com/NixOS/nixpkgs/blob/4848569afdf2bcfeba902a25c799522d2905f73d/pkgs/development/rocm-modules/5/rocblas/default.nix#L152-L154

We should work out some propagatedBuildInputs-based solution and ensure that tools like CMake handle the splayed layouts well.

I wonder if we could start by just moving the symlinks into a separate, say, dev output and hoping that the dependency isn't retained at runtime

Ah I see now. Hmm, we could do each separately behind an optionalString guard.

SomeoneSerge commented 10 months ago

Hmm, we could do each separately behind an optionalString guard.

That doesn't really sound like much of a solution... but are these actually ever accessed at runtime?

Madouura commented 10 months ago

Hmm, we could do each separately behind an optionalString guard.

That doesn't really sound like much of a solution... but are these actually ever accessed at runtime?

After thinking about it in the shower, it would still be problematic, since even if we got it to only do say, "gfx90", the user would still have to recompile rocblas itself (which shouldn't be too bad, since it's only using one gpu target!), or use a non-nixos cache.

SomeoneSerge commented 10 months ago

Add the Tensile library to your application's CMake target. The Tensile library will be written, compiled and linked to your application at application-compile-time. https://github.com/ROCm/Tensile/wiki

This sounds more plausible (that we don't need these (e.g. in pytorch) after the build).

Notably, rocblas/cmake doesn't reference them neither by the extensions nor by the relative paths:

❯ ag hsaco result/lib/cmake/rocblas/ --follow
❯ ag library result/lib/cmake/rocblas/ --follow
<target names and variables but no paths>

I tried looking at rocm-merged from torchWithRocm in search of references for these files and didn't find (so far) any that'd look relevant either:

XX in 🌐 YYYYY in /nix/store/a7fwwnhppk6h97hb65dl3rwd4iqxs61p-rocm-merged
❯ ag --follow --search-binary --max-count=4 '\.hsaco'
Binary file rocblas/lib/librocblas.so matches.

Binary file lib/librocblas.so.3 matches.

Binary file lib/librocblas.so matches.
ERR: Too many matches in ./lib/rocblas/library/TensileManifest.txt. Skipping the rest of this file.

lib/rocblas/library/TensileManifest.txt
625:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx803.hsaco
626:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx900.hsaco
627:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx906-xnack-.hsaco
628:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx908-xnack-.hsaco

Binary file lib/librocblas.so.3.1.0 matches.
❯ strings rocblas/lib/librocblas.so | rg 'hsaco$'
.hsaco
❯ ag --max-count 4 --follow library/
include/CL/cl_platform.h
494:    /* http://msdn.microsoft.com/en-us/library/373ak2y1%28VS.71%29.aspx                                                 */
ERR: Too many matches in ./lib/rocblas/library/TensileManifest.txt. Skipping the rest of this file.

lib/rocblas/library/TensileManifest.txt
1:/build/source/build/Tensile/library/TensileLibrary_Type_HS_HPA_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx906.dat

The /build/source/build/ clearly is just cruft coming from the sandbox.

So the question stands: who's using them?

mschwaig commented 8 months ago

Fedora has added tensile to their rocblas builds for the 6.0.2 release.

They write that MIOpen needs it.

https://src.fedoraproject.org/rpms/rocblas/c/918d514378861b900624c73b91fb75059c96dbf0?branch=rawhide

mschwaig commented 8 months ago

TLDR: I would like to try to add a rocblasWithTensile package that is built on top of rocblas(without tensile), and consumers who don't need it should pick the one without.

I don't know which other consumers besides MIOpen need/benefit from tensile.

Looking at https://github.com/ROCm/rocBLAS/blob/adb8567f1bdad56b3b688a0b6dec1f79bf438ab4/CMakeLists.txt i wonder if it is possible with a few changes to split the build into derivations like this

  1. rocblas: first build rocblas without buildTensile
  2. gfx[...device..]: put together a nice working directory to build ONLY the tensile parts per gpuTarget with only $path/lib/rocblas/library as the output
  3. rocblasWithTensile: create rocblas with Tensile by symlinking from 1 and 2 as required

Consumers who don´t need tensile could get that as rocblasWithTensile.override { buildTensile = false; }, even from the cache. That flag already exists. We could decide if we also want to expose it as a package and use that in most downstream consumers. For changes to gpuTargets, caching should also work with this setup. Provided the grouping by GPU arch is not too important.