Open pwaller opened 10 months ago
Attached a debug-verbose log: debug-verbose.log
There are two nixpkgs in the store:
/nix/store/akd7khgf3bxk6ribvcigwq1adi9g8zi4-source
(this is used for runCommandNoCC, I expect it is required to have those sources present).nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source
(this is inputs.arbitrarySource
).In the debug log, if the gzip is in the cache, we see, e.g.
using cache entry '{"name":"source","type":"file","url":"https://github.com/nixos/nixpkgs/archive/49e5e473182a44fd0cd9048e4a3a99ba1d47da37.tar.gz"}' -> '{"etag":"W/\"4c8866242df8aefa9e4d2f8c5cf54a7572504d57978f35d43b63554d6520e402\"","url":"https://codeload.github.com/NixOS/nixpkgs/tar.gz/49e5e473182a44fd0cd9048e4a3a99ba1d47da37"}', '/nix/store/wkwfbih24zqq6w853wfqz1xf6r1isvy6-source'
performing daemon worker op: 7
locking path '/nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source'
lock acquired on '/nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source.lock'
lock released on '/nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source.lock'
performing daemon worker op: 26
checking access to '/nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source/flake.nix'
evaluating file '/nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source/flake.nix'
Indicating that something wants to eval the flake.nix
for arbitrarySources
.
Introducing inputs.arbitrarySources.flake = false;
does not appear to help, it still says evaluating file '/nix/store/ijny9v749dpicbcgvx6iwxk317dzsybs-source/flake.nix'
in the debug log.
(In my real scenario, I use inputs.*.flake = false
for the majority of my inputs, as well).
I've tested #6530, unfortunately that appears to make the situation worse, with it taking 14s instead of 4s to pull in the sources (where it shouldn't need them).
Key elements from the log show that flake.nix is still being evaluated from arbitrarySources
even though I've set .flake = false;
.
Key log output from that branch (575902bcbf57b8208ee1ed6544fed3887f3860e6).
evaluating file '«github:nixos/nixpkgs/49e5e473182a44fd0cd9048e4a3a99ba1d47da37»/flake.nix'
copying '«github:nixos/nixpkgs/49e5e473182a44fd0cd9048e4a3a99ba1d47da37»/' to the store...
The reproducer in this ticket still reproduces the problem on that branch.
I've found a mistake on the above analysis for the source tree abstraction - it turns out call-flake.nix uses the flake lock file to determine if node.flake
is false. I had set arbitrarySources.flake = false
; but this had not propagated to the lock file. Deleting the lock file and recreating it had the desired effect. This doesn't appear to help flakes though: those still try to evaluate flake.nix here, where I believe what is needed is a merely a path.
I see that "${arbitrarySource}"
is evaluated by rendering arbitrarySource.outPath
into a string, where the outPath comes from here:
The missing primitive is one that would enable evaluating this outPath
for the purposes of string interpolation without importing the flake.nix (via outputs -> flake.outputs -> import (outPath + "/flake.nix")
). I'm not aware of whether such a primitive currently exists in the language. This would be needed so that the flake can behave both as flakes currently do (with all of their outputs defined on them), but also string interpolate without requiring that they get fetched.
~I'm able to get the behaviour I want with the following flake - using fetchTree
directly on the flake.lock
json.~
{
# Needed for runCommandNoCC.
inputs.nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
# Arbitrary old nixpkgs commit you're unlikely to have the sources for in your /nix/store directory.
inputs.arbitrarySources.url = "github:nixos/nixpkgs/49e5e473182a44fd0cd9048e4a3a99ba1d47da37";
outputs = { nixpkgs, ... }: let
flakeLock = builtins.fromJSON (builtins.readFile ./flake.lock);
arbitrarySourcesOutPath = fetchTree flakeLock.nodes.arbitrarySources.locked;
in {
packages.x86_64-linux.default = nixpkgs.legacyPackages.x86_64-linux.runCommandNoCC "test" {} ''
echo ${arbitrarySourcesOutPath}
touch $out
'';
};
}
~This allows me to get the arbitrarySourcesOutPath
; building this for a second time does not require the sources present.~
Edit: After further experimentation with the above, I'm confused. This does still appear to unpack the sources, so I must have been mistaken when I wrote this previously, at least I can't reproduce this result now; if the sources are missing but the build is present, I still witness the sources are fetched.
However, I have noticed that the branch on #6530 (Source tree abstraction) it still copies the sources to the store with the above flake:
copying '«github:nixos/nixpkgs/49e5e473182a44fd0cd9048e4a3a99ba1d47da37»/' to the store
The performance is quite a bit worse than the master branch. It takes 14 seconds to copy the nixpkgs to the store, where the pre-#6530 code took 4 seconds.
perf record
shows that 76% of the time (10s) is spent in nix::SourceAccessor::dumpPath -> nix::SourceAccessor::lstat -> nix::GitInputAccessor::lookup -> git_tree_entry_bypath
.
For the above case I would hope that it wouldn't materialise the tree at all when the default package has been built.
I think maybe I misunderstood something about how fetchTree works. I see from the recent documentation: #9258 that fetchTree fetches the requested tree when it's called. I had thought/hoped/assumed it would work as fetchFromGithub
and friends do: the sources there aren't required until they're used.
Thinking aloud: Presumably the mechanism behind fetchFromGitHub lazily fetching packages works because the fetching of a source lives in a separate derivation; and something using the path of the fetched source then gets a dependency on the separate derivation. And only if it's necessary to run the build does the source get fetched. These concepts aren't applicable for the fetchTree builtin, which fetches during evaluation, not while derivations get built?
So I take it then that I need to use a non-builtin fetchTree primitive if I want to get the effect I'm after.
Likely related:
Thanks for the link @tomberek, that sounds like exactly what I want, assuming the credentials issue can be sorted out.
I've actually been able to work around this for my use case, but I have to invoke 'nix flake archive' in order to populate the store (/substituters) with the sources.
After that, I can switch to the nixpkgs fetchers and feed them with the information from the lockfile via readJSON. This gives a much much much improved experience.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/how-to-disable-automatic-unpacking-of-nix-flake-inputs/36911/2
Hi. The current correct way to do this in the case of build dependencies is to do it the same as nixpkgs and use a fixed output derivation.
Realistically you probably want to move those particular build time only sources out of flakes altogether and use something like niv, npins, or gridlock for a second lock file for those specific things (if using niv or npins, throw out the nix code they give you because it uses built in fetchers and then just use their json). See https://jade.fyi/blog/flakes-arent-real/
Is your feature request related to a problem? Please describe.
Flake input sources are fetched and unpacked even if they are unneeded. If you have lots of large flake inputs sources, this becomes a big bottleneck and resource consumer (wall time, cpu time, disk io, disk storage, network bandwidth and github API calls) when fetching from a cache.
Describe the solution you'd like
Building a flake output which is a cache hit should not require fetching the input sources.
Describe alternatives you've considered
Additional context
Consider the following flake:
Note: arbitrarySources uses nixos/nixpkgs as an input, but if the default package is built or available via a substituter, the sources are no longer required (arbitrarySources is not a runtime dependency).
Expectation:
default
package has already been built,nix build
should be a cheap no-op, even if the sources are not present.Problem:
nix build
of this flake fetches/unpacks the input source forarbitrarySource
, even if the output is already built. These sources are subsequently unnecessary, fetching them consumes network bandwidth, disk bandwidth, CPU time, wall clock time and disk space (to store the flake source).default
is available via a substituter, fetching the sources (and putting them in the store) is unnecessary.Reproduction (whole block can be pasted including parentheses, runs in subshell with tracing switched on):
Reproduction output
What I expect to see
The above shows the following times to run nix build:
I expect in the latter case that (3) should take as long as (2), not (1); the lost time in (3) is spent fetching and unpacking the sources.
nix build path:. --debug --verbose
shows that the whole of nixpkgs is being unpacked in this scenario. It's not being downloaded again because the gzipped sources are also a cache hit (and not deleted by nix store --delete); but in my real world scenario where the package can be fetched from a substituter, I see eval hang while all the flake input sources are fetched and unpacked, which means waiting multiple minutes and consuming substantial resources.I note that I've used nixpkgs as a stand-in here; I do not expect fixing the issue I've described to improve typical uses of nixpkgs very much, because those would actually involve eval'ing nixpkgs, whereas the scenario I describe only use the flake inputs as a
src
attribute tomkDerivation
; in this case, the sources are only necessary if the derivation need to be built.Priorities
Add :+1: to issues you find important.