Open SomeoneSerge opened 10 months ago
(I haven't looked up what those are yet)
Symlinks are the main reasons the runtime closure is so huge (CC https://github.com/NixOS/nixpkgs/pull/260299): https://github.com/NixOS/nixpkgs/blob/4848569afdf2bcfeba902a25c799522d2905f73d/pkgs/development/rocm-modules/5/rocblas/default.nix#L152-L154
We should work out some propagatedBuildInputs
-based solution and ensure that tools like CMake handle the splayed layouts well.
I wonder if we could start by just moving the symlinks into a separate, say, dev
output and hoping that the dependency isn't retained at runtime
Pretty sure we need everything that's in the (split) rocblas closure. I haven't really found a better solution than what we have now.
Symlinks are the main reasons the runtime closure is so huge (CC #260299):
We should work out some
propagatedBuildInputs
-based solution and ensure that tools like CMake handle the splayed layouts well.I wonder if we could start by just moving the symlinks into a separate, say,
dev
output and hoping that the dependency isn't retained at runtime
Ah I see now.
Hmm, we could do each separately behind an optionalString
guard.
Hmm, we could do each separately behind an optionalString guard.
That doesn't really sound like much of a solution... but are these actually ever accessed at runtime?
Hmm, we could do each separately behind an optionalString guard.
That doesn't really sound like much of a solution... but are these actually ever accessed at runtime?
After thinking about it in the shower, it would still be problematic, since even if we got it to only do say, "gfx90", the user would still have to recompile rocblas itself (which shouldn't be too bad, since it's only using one gpu target!), or use a non-nixos cache.
Add the Tensile library to your application's CMake target. The Tensile library will be written, compiled and linked to your application at application-compile-time. https://github.com/ROCm/Tensile/wiki
This sounds more plausible (that we don't need these (e.g. in pytorch) after the build).
Notably, rocblas/cmake
doesn't reference them neither by the extensions nor by the relative paths:
❯ ag hsaco result/lib/cmake/rocblas/ --follow
❯ ag library result/lib/cmake/rocblas/ --follow
<target names and variables but no paths>
I tried looking at rocm-merged from torchWithRocm in search of references for these files and didn't find (so far) any that'd look relevant either:
XX in 🌐 YYYYY in /nix/store/a7fwwnhppk6h97hb65dl3rwd4iqxs61p-rocm-merged
❯ ag --follow --search-binary --max-count=4 '\.hsaco'
Binary file rocblas/lib/librocblas.so matches.
Binary file lib/librocblas.so.3 matches.
Binary file lib/librocblas.so matches.
ERR: Too many matches in ./lib/rocblas/library/TensileManifest.txt. Skipping the rest of this file.
lib/rocblas/library/TensileManifest.txt
625:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx803.hsaco
626:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx900.hsaco
627:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx906-xnack-.hsaco
628:/build/source/build/Tensile/library/TensileLibrary_Type_DD_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback_gfx908-xnack-.hsaco
Binary file lib/librocblas.so.3.1.0 matches.
❯ strings rocblas/lib/librocblas.so | rg 'hsaco$'
.hsaco
❯ ag --max-count 4 --follow library/
include/CL/cl_platform.h
494: /* http://msdn.microsoft.com/en-us/library/373ak2y1%28VS.71%29.aspx */
ERR: Too many matches in ./lib/rocblas/library/TensileManifest.txt. Skipping the rest of this file.
lib/rocblas/library/TensileManifest.txt
1:/build/source/build/Tensile/library/TensileLibrary_Type_HS_HPA_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx906.dat
The /build/source/build/
clearly is just cruft coming from the sandbox.
So the question stands: who's using them?
Fedora has added tensile to their rocblas builds for the 6.0.2 release.
They write that MIOpen needs it.
https://src.fedoraproject.org/rpms/rocblas/c/918d514378861b900624c73b91fb75059c96dbf0?branch=rawhide
TLDR: I would like to try to add a rocblasWithTensile
package that is built on top of rocblas
(without tensile), and consumers who don't need it should pick the one without.
I don't know which other consumers besides MIOpen need/benefit from tensile.
Looking at https://github.com/ROCm/rocBLAS/blob/adb8567f1bdad56b3b688a0b6dec1f79bf438ab4/CMakeLists.txt i wonder if it is possible with a few changes to split the build into derivations like this
rocblas
: first build rocblas
without buildTensile
gfx[...device..]
: put together a nice working directory to build ONLY the tensile parts per gpuTarget with only $path/lib/rocblas/library
as the outputrocblasWithTensile
: create rocblas with Tensile by symlinking from 1 and 2 as requiredConsumers who don´t need tensile could get that as rocblasWithTensile.override { buildTensile = false; }
, even from the cache. That flag already exists. We could decide if we also want to expose it as a package and use that in most downstream consumers. For changes to gpuTargets, caching should also work with this setup. Provided the grouping by GPU arch is not too important.
...if splittable. CC https://github.com/NixOS/nixpkgs/issues/197885
Issue description
@NixOS/rocm-maintainers (and @NixOS/cuda-maintainers as potentially interested)
Technical details