Open cleverca22 opened 6 years ago
Word of caution: This is my first dive into the hydra & nix sources. I might be completly on the wrong track..
I have the following diff that I suspect might be a hack that "fixes" this:
diff --git a/src/libstore/remote-store.cc b/src/libstore/remote-store.cc
index ea86ef05..3d0b55ca 100644
--- a/src/libstore/remote-store.cc
+++ b/src/libstore/remote-store.cc
@@ -476,6 +476,7 @@ Path RemoteStore::addTextToStore(const string & name, const string & s,
void RemoteStore::buildPaths(const PathSet & drvPaths, BuildMode buildMode)
{
auto conn(connections->get());
+ setOptions(*conn);
conn->to << wopBuildPaths;
if (GET_PROTOCOL_MINOR(conn->daemonVersion) >= 13) {
conn->to << drvPaths;
@@ -503,6 +504,7 @@ BuildResult RemoteStore::buildDerivation(const Path & drvPath, const BasicDeriva
BuildMode buildMode)
{
auto conn(connections->get());
+ setOptions(*conn);
conn->to << wopBuildDerivation << drvPath << drv << buildMode;
conn->processStderr();
BuildResult res;
I haven't actually setup the whole systems to test this but I have a very strong suspicion this would work.
The problem (as far as I could see it) is that the nix-store --serve --write
process that is spawned on the build slaves opens the store right after parsing the command line arguments. When opening the store it actually seems to connect to the local nix-daemon. After connect it calls setOptions
. At the point where the build host receives the derivation (and the buildTimeout
& maxSilent
parameters) the store is already configured and only the derivation will be built. Whenever the nix-store process receives a new commands it seems to reconfigure the internal settings
but never calls setOptions
.
https://github.com/cleverca22/nix-misc/blob/master/default.nix#L21-L31 is how i reproduced the issue on my hydra
I think I see a problem in step vs. job timeout. Example: we have high chromium.meta.timeout
, but if that's triggered as dependency of some other job, it apparently times out after the default ten hours already. Of course, it might be argued that it's a feature and not a bug, though I'd intuitively expect timeout overrides to be per-step.
@vcunat thats where i can see a toposort being useful if job1 wants to directly build chrome and job2 depends on chrome then it should build chrome under job1, not job2
Yes, but I expect that wouldn't solve cases where there's no job for a build step, e.g. we only have a job for wrapped package and not for unwrapped one.
https://github.com/NixOS/nix/blob/2825e05d21ecabc8b8524836baf0b9b05da993c6/src/nix-store/nix-store.cc#L782-L788
if the build-slave is using its local nix-daemon, then the timeout and maxSilent set in meta have no effect on the builds