NixOS / hydra

Hydra, the Nix-based continuous build system
http://nixos.org/hydra
GNU General Public License v3.0
1.19k stars 301 forks source link

meta.timeout does not always work #591

Open cleverca22 opened 6 years ago

cleverca22 commented 6 years ago

https://github.com/NixOS/nix/blob/2825e05d21ecabc8b8524836baf0b9b05da993c6/src/nix-store/nix-store.cc#L782-L788

if the build-slave is using its local nix-daemon, then the timeout and maxSilent set in meta have no effect on the builds

andir commented 6 years ago

Word of caution: This is my first dive into the hydra & nix sources. I might be completly on the wrong track..

I have the following diff that I suspect might be a hack that "fixes" this:

diff --git a/src/libstore/remote-store.cc b/src/libstore/remote-store.cc
index ea86ef05..3d0b55ca 100644
--- a/src/libstore/remote-store.cc
+++ b/src/libstore/remote-store.cc
@@ -476,6 +476,7 @@ Path RemoteStore::addTextToStore(const string & name, const string & s,
 void RemoteStore::buildPaths(const PathSet & drvPaths, BuildMode buildMode)
 {
     auto conn(connections->get());
+    setOptions(*conn);
     conn->to << wopBuildPaths;
     if (GET_PROTOCOL_MINOR(conn->daemonVersion) >= 13) {
         conn->to << drvPaths;
@@ -503,6 +504,7 @@ BuildResult RemoteStore::buildDerivation(const Path & drvPath, const BasicDeriva
     BuildMode buildMode)
 {
     auto conn(connections->get());
+    setOptions(*conn);
     conn->to << wopBuildDerivation << drvPath << drv << buildMode;
     conn->processStderr();
     BuildResult res;

I haven't actually setup the whole systems to test this but I have a very strong suspicion this would work.

The problem (as far as I could see it) is that the nix-store --serve --write process that is spawned on the build slaves opens the store right after parsing the command line arguments. When opening the store it actually seems to connect to the local nix-daemon. After connect it calls setOptions. At the point where the build host receives the derivation (and the buildTimeout & maxSilent parameters) the store is already configured and only the derivation will be built. Whenever the nix-store process receives a new commands it seems to reconfigure the internal settings but never calls setOptions.

cleverca22 commented 6 years ago

https://github.com/cleverca22/nix-misc/blob/master/default.nix#L21-L31 is how i reproduced the issue on my hydra

vcunat commented 5 years ago

I think I see a problem in step vs. job timeout. Example: we have high chromium.meta.timeout, but if that's triggered as dependency of some other job, it apparently times out after the default ten hours already. Of course, it might be argued that it's a feature and not a bug, though I'd intuitively expect timeout overrides to be per-step.

cleverca22 commented 5 years ago

@vcunat thats where i can see a toposort being useful if job1 wants to directly build chrome and job2 depends on chrome then it should build chrome under job1, not job2

vcunat commented 5 years ago

Yes, but I expect that wouldn't solve cases where there's no job for a build step, e.g. we only have a job for wrapped package and not for unwrapped one.