Open RuRo opened 4 months ago
I believe you've nailed those issues on the head, I also struggle answering those exact questions, making using the cache an exercise in frustration.
I have had some success asking around the NixOS CUDA Matrix room, however figuring out which package is successfully included in which nixpkgs-revision CI build seems basically impossible without some insider know-how. SomeoneSerge at a few occasions was kind enough to point me to the exact build that included what I needed, but I have yet to be able to reverse-engineer how this can be done as an end-user without bothering the maintainers.
Hi! Sorry about the frustration. I've been spending less and less time on this repo. My idea for the next steps is roughly this:
P.S. @RuRo Sorry for the delayed response, I actually didn't get a notification for this issue o_0
This sounds like a great development!
I still have one question, though: if/when this repo gets archived, what would be the appropriate place to discuss / report issues with the new nix-community CUDA cache/builders? For example, a lot of the questions in my original post would also apply to the nix-community cache:
cudaCapabilities = [ "8.6" ]
?nixos-unstable
, but only using the subset of commits that had successfully built and cached the CUDA packages?Thanks.
One venue would be #nix-community:nixos.org on matrix paralleled by https://github.com/nix-community/infra/issues on github. The nix-community hydra follows the nixos-unstable branch and builds its pkgs/top-level/release-cuda.nix
file. That's where the list of packages and capabilities are controlled, and currently that only features the "all caps" variant for x86_64 and "all caps" sbsa (not jetson) for aarch64. This can be adjusted by opening a PR against Nixpkgs, but in coordination with the nix-community team because these changes might lead to dramatic impact in load on the community hydra's build servers, shared with projects other than the cuda cache
Follow the links in https://github.com/NixOS/nixpkgs/pull/324379
Also to answer the original questions, even though that's less relevant now:
nix/overlays.nix seems to also be optionally enabling MKL versions of LAPACK/BLAS.
Two ideas wrt the overlays were 1) to test non-default instances of packages (e.g. mpi or mkl support that was otherwise disabled), 2) to provide an executable instructions on how to get a cache-hit/a matching hash when enabling these optional features, since it's kind of like looking for a needle in a haystack...
Most parts of the overlays were over time merged into nixpkgs (some guarded behind config.cudaSupport
) so the overlays became less relevant
What are those "different frequencies" exactly?
That used to be specified like so: https://github.com/SomeoneSerge/nixpkgs-cuda-ci/pull/14/files#diff-206b9ce276ab5971a2489d75eb1b12999d4bf3843b7988cbe8d687cfde61dea0L170
But then the onSchedule
jobs were disabled because hercules kept on accumulating pending effects without ever running any, requiring that the queue be reset. Currently there's just a github action running updating the lock file from time to time and triggering the default job...
The README seems to suggest that adding
cuda-maintainers.cachix.org
as a substituter and settingallowUnfree = true
andcudaSupport = true
is sufficient to get the prebuilt packages. However, quite often I end up rebuild some of the CUDA-enabled packages after updating.I have a few questions:
1) I currently have
nixpkgs
followinggithub:nixos/nixpkgs/nixos-unstable
in my flake and I runnix flake update nixpkgs
every once in a while, but this seems like a bad strategy, because the CI might be lagging behind upstream and not every commit may be successfully built.2) The README mentions that
3)
nix/overlays.nix
seems to also be optionally enabling MKL versions of LAPACK/BLAS.So, for example, if I set
cudaCapabilities = [ "8.6" ]
and enable the MKL the same way as yournix/overlay.nix
, how can I determine the latestnixos-unstable
commit that is already available incuda-maintainers.cachix.org
?