Open Ma27 opened 4 months ago
I'm still sorta confused how it ever worked before / are the performance ramifications of looking up in the binary cache when we previously looked up in the local store (presumably sometimes successfully?) OK.
I'm still sorta confused how it ever worked before
Given how the Perl bindings used to look like, my best guess is that /nix/store
was always used (IIRC there was no real way around that in pre Nix 2.0 unless you change some configure flags) and store_uri
was added later on, but a lot of the Perl code was never updated accordingly.
Perhaps @edolstra knows a little more?
are the performance ramifications of looking up in the binary cache when we previously looked up in the local store (presumably sometimes successfully
It should be OK for file://
, but I'm absolutely not sure about e.g. s3://
(on hydra.nixos.org), that's part of the reason why I put that change into its own commit (also, I don't have an environment to reasonably test this).
Perhaps we want to leave that part out for now or split it up once more even?
@Ma27 Right. Yeah the fact that Hydra presumably supported binary caches long before Nix had the store abstraction that included binary caches, is what makes things confusing --- why does it seems like the old way of working with the binary cache is wrong even though we've had it for quite a while?
why does it seems like the old way of working with the binary cache is wrong even though we've had it for quite a while?
Yeah, I tried to use that when I first set up a Hydra and I think I never managed to set up binary caching back then.
Considering that binary_cache_secret_key_file
raises a deprecation anyways, I'm wondering if we should rip that out instead and suggest to serve a flat-file binary cache via nginx. See also https://github.com/NixOS/hydra/issues/1331.
I wouldn't expect local stores to be much of an impact, but you know the implementations better than me. Also, the most used thing is probably the build overview which is basically a few isValidPath()
calls and I can't remember derivations with >10 outputs.
OTOH I'm not sure doing that for remote stores is sensible because of potential latency and because S3 is pay by requester now: https://github.com/NixOS/infra/pull/299 cc @NixOS/infra for opinions as well.
So perhaps we want to exclude 44c98e130b01ae512285497747b75006e7c377cc from this PR @Ericson2314 ?
OTOH I'm not sure doing that for remote stores is sensible because of potential latency and because S3 is pay by requester now: https://github.com/NixOS/infra/pull/299
Pay by requester shouldn't be relevant here - I doubt anyone other than us is directly using s3://nix-cache for their Hydra instance, and when the requester is us (NixOS Hydra) we'd be paying either way even if it wasn't pay by requester. Plus, the operations here are all super cheap, it's bandwidth that's the expensive part.
Latency could indeed be a problem: this is switching from an operation that completes in milliseconds (checking the existence of a local path) to an operation that completes in hundreds of milliseconds (cross-atlantic roundtrip), and there's no parallelism or batching anywhere to try and alleviate this. Of course, I'm not saying that it is a problem - I don't know nearly enough about the Hydra codebase to even understand what's being changed here. But if we were before checking 100+ paths in the local store, and we switch to checking 100+ paths in a remote store, we're going from something near-instant (sub-second) to something that needs roughly a minute to run.
Yep, agreed about the latency part.
queryPathFromHashPart
in Controller/Root.pm
is only for Hydra's builtin binary cache, so irrelevant for hno. In fact, if infra-only Hydra becomes a thing, that's something I'd suggest to rip out.NixChannel
change, but somebody should probably double-check that.build_GET
in Controller/Build.pm
is used to check if a product was GCed. This only affects the outputs of a single derivation, i.e. O(n)
with n
being number of outputs. Can't recall an instance where this was >10. This will also be helpful for hno I think because then you'll see on Hydra's build overview whether a certain build was part of the upcoming GC.output
sub in Coontroller/Build.pm
also works on a single derivation only.These are the changes that I'd consider good to go.
While revisiting build_deps
& runtime_deps
in Controller/Build.pm
, I'm not sure anymore if that's something we want, especially on hno which is already rather loaded AFAIK. Perhaps we want to split this off for now @delroth @Ericson2314 ?
Regarding the latency, I think this should be fine because a lot of the queries are going to be repeats and the narinfo cache prevents repeated checks? Not 100% sure but I'm in favour of merging it without the split and seeing how it impacts performance.
Fine by me, but that call should be made by the operators of hno
Based on #1359. Trivial now that we have Perl bindings that accept a store URI.
This changes the following operations to not use the local store, but the binary cache store instead:
Also switched from a global variable to
sub
s to keep the include test working that needs to read config (for thestore_uri
) before the files to be included are created.Closes #1352 as it solves the same issue.
cc @Ericson2314