NixOS / hydra

Hydra, the Nix-based continuous build system
http://nixos.org/hydra
GNU General Public License v3.0
1.1k stars 291 forks source link

Use binary cache at `store_uri` for certain operations #1360

Open Ma27 opened 4 months ago

Ma27 commented 4 months ago

Based on #1359. Trivial now that we have Perl bindings that accept a store URI.

This changes the following operations to not use the local store, but the binary cache store instead:

Also switched from a global variable to subs to keep the include test working that needs to read config (for the store_uri) before the files to be included are created.

Closes #1352 as it solves the same issue.

cc @Ericson2314

Ericson2314 commented 4 months ago

I'm still sorta confused how it ever worked before / are the performance ramifications of looking up in the binary cache when we previously looked up in the local store (presumably sometimes successfully?) OK.

Ma27 commented 4 months ago

I'm still sorta confused how it ever worked before

Given how the Perl bindings used to look like, my best guess is that /nix/store was always used (IIRC there was no real way around that in pre Nix 2.0 unless you change some configure flags) and store_uri was added later on, but a lot of the Perl code was never updated accordingly.

Perhaps @edolstra knows a little more?

are the performance ramifications of looking up in the binary cache when we previously looked up in the local store (presumably sometimes successfully

It should be OK for file://, but I'm absolutely not sure about e.g. s3:// (on hydra.nixos.org), that's part of the reason why I put that change into its own commit (also, I don't have an environment to reasonably test this).

Perhaps we want to leave that part out for now or split it up once more even?

Ericson2314 commented 4 months ago

@Ma27 Right. Yeah the fact that Hydra presumably supported binary caches long before Nix had the store abstraction that included binary caches, is what makes things confusing --- why does it seems like the old way of working with the binary cache is wrong even though we've had it for quite a while?

Ma27 commented 4 months ago

why does it seems like the old way of working with the binary cache is wrong even though we've had it for quite a while?

Yeah, I tried to use that when I first set up a Hydra and I think I never managed to set up binary caching back then.

Considering that binary_cache_secret_key_file raises a deprecation anyways, I'm wondering if we should rip that out instead and suggest to serve a flat-file binary cache via nginx. See also https://github.com/NixOS/hydra/issues/1331.

I wouldn't expect local stores to be much of an impact, but you know the implementations better than me. Also, the most used thing is probably the build overview which is basically a few isValidPath() calls and I can't remember derivations with >10 outputs.

OTOH I'm not sure doing that for remote stores is sensible because of potential latency and because S3 is pay by requester now: https://github.com/NixOS/infra/pull/299 cc @NixOS/infra for opinions as well.

So perhaps we want to exclude 44c98e130b01ae512285497747b75006e7c377cc from this PR @Ericson2314 ?

delroth commented 4 months ago

OTOH I'm not sure doing that for remote stores is sensible because of potential latency and because S3 is pay by requester now: https://github.com/NixOS/infra/pull/299

Pay by requester shouldn't be relevant here - I doubt anyone other than us is directly using s3://nix-cache for their Hydra instance, and when the requester is us (NixOS Hydra) we'd be paying either way even if it wasn't pay by requester. Plus, the operations here are all super cheap, it's bandwidth that's the expensive part.

Latency could indeed be a problem: this is switching from an operation that completes in milliseconds (checking the existence of a local path) to an operation that completes in hundreds of milliseconds (cross-atlantic roundtrip), and there's no parallelism or batching anywhere to try and alleviate this. Of course, I'm not saying that it is a problem - I don't know nearly enough about the Hydra codebase to even understand what's being changed here. But if we were before checking 100+ paths in the local store, and we switch to checking 100+ paths in a remote store, we're going from something near-instant (sub-second) to something that needs roughly a minute to run.

Ma27 commented 4 months ago

Yep, agreed about the latency part.

These are the changes that I'd consider good to go.

While revisiting build_deps & runtime_deps in Controller/Build.pm, I'm not sure anymore if that's something we want, especially on hno which is already rather loaded AFAIK. Perhaps we want to split this off for now @delroth @Ericson2314 ?

lheckemann commented 3 months ago

Regarding the latency, I think this should be fine because a lot of the queries are going to be repeats and the narinfo cache prevents repeated checks? Not 100% sure but I'm in favour of merging it without the split and seeing how it impacts performance.

Ma27 commented 3 months ago

Fine by me, but that call should be made by the operators of hno