NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
11.58k stars 1.45k forks source link

Substitute builtin url fetchers #4313

Open roberth opened 3 years ago

roberth commented 3 years ago

Is your feature request related to a problem? Please describe.

Unlike when using a derivation-based fetcher, the evaluation-time fetchers builtins.fetchurl and builtins.fetchTarball do not try to substitute. I need to use these fetchers because these are the ones that respect the user's netrc, letting them download authenticated resources. We'd like to use the binary cache to give developers and deployments access to the data without duplicating credentials to a system these principals shouldn't have full access to.

Describe the solution you'd like

When fetching data in builtins.fetchurl and builtins.fetchTarball, query the substituters too.

Describe alternatives you've considered

Duplicate credentials everywhere and make the setup less secure.

Additional context

Here's an example of trying to use the data on a machine without credentials. The data is available on the binary cache, but it wasn't queried.

nix-instantiate -vvv --eval --expr 'builtins.fetchTarball { url = "https://private-data-repo.example.com/machine-learning-model.dat"; sha256 = "de2d3f3892ecf7000fe8cddb2ce0c801217ec0f31079ba56beff62f20a0b982f"; }'
did not find cache entry for '{"name":"machine-learning-model.dat","type":"file","url":"https://private-data-repo.example.com/machine-learning-model.dat"}'
downloading 'https://private-data-repo.example.com/machine-learning-model.dat'...
starting download of https://private-data-repo.example.com/machine-learning-model.dat
verify TLS: Nix CA file = '/etc/ssl/certs/ca-certificates.crt'
finished download of 'https://private-data-repo.example.com/machine-learning-model.dat'; curl status = 0, HTTP status = 401, body = 12 bytes
error: --- FileTransferError ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- nix-instantiate
unable to download 'https://private-data-repo.example.com/machine-learning-model.dat': HTTP error 401 ('')

response body:

Unauthorized
download thread shutting down

Tested with approximately Nix revision 05d9442f6 ie nixUnstable around Nov 26 2020. This was also brought up in #2114, again in #3543, assumed resolved before #4149 but currently not working.

bjornfor commented 3 years ago

Also needed for builtins.fetchGit (I presume, only if narHash is given).

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

AleXoundOS commented 2 years ago

builtins.fetchGit seems to be affected as well, but not sure.

roberth commented 2 years ago

From https://github.com/NixOS/nix/pull/6174#discussion_r816076478

Remote path querying has a significant latency, that I don't think we want to incur on every fetch, many of which will not be substitutable anyway. Ideally we could perform the remote path info query while the fetcher runs, and make some clever decisions regarding cancellation. That's not something I want to dive into, especially considering the non-zero risk of corrupting local files when cancelling the fetcher.

Instead I figured the next step could be to try substitution when the fetching fails, which also solves the problem, without performance impact.

roberth commented 1 year ago

At minimum, query the local store before doing anything else.

sheldonneuberger-sc commented 1 year ago

This would be a useful feature for me. I fill everything I can into my own binary cache to reduce dependencies on random services, but I have to use fetchGit for auth reasons. This means when my fleet of machines evaluate this closure, they have to fetch from git a few times which is much less reliable than my binary cache.

Remote path querying has a significant latency

By "remote path querying" do you mean the GET request to the binary cache? If so, that's about 200ms for my cloud binary cache. If I understand right, this price would only be paid the first time you evaluate the closure on a machine, because future evals on that same machine will just see the package in the local nix store. For my use-case, this is a great trade-off, but are there other major use-cases where this would be a very degraded experience?

roberth commented 1 year ago

By "remote path querying" do you mean the GET request to the binary cache?

Yeah, can also be S3 or something. Querying the substituter(s).

If I understand right, this price would only be paid the first time you evaluate the closure on a machine

Yes. It would query the local store and be done, or then query the local binary-caches.sqlite file and download the nar file immediately.

For my use-case, this is a great trade-off, but are there other major use-cases where this would be a very degraded experience?

Cases where the substituter is significantly slower than the original source, I guess. Seems like a bit of a bad idea to use such a substituter though, so not sure if this is a significant scenario.

sheldonneuberger-sc commented 1 year ago

I created a PR to do this for fetchGit: https://github.com/NixOS/nix/pull/8246. Not sure if it's possible to do this more generically, i.e. could we just enable substitution in fetchTree.cc for any derivation that provides a sha256/narHash?

jsoo1 commented 11 months ago

This matters to us a lot. We run nix in network-restricted environments and public internet access is firewalled so we can only use fetchers if they try to substitute from our blessed cache. Am I missing some context in https://github.com/NixOS/nix/pull/8184?