NixOS / hydra

Hydra, the Nix-based continuous build system
http://nixos.org/hydra
GNU General Public License v3.0
1.17k stars 300 forks source link

hydra-evaluator won't add a build to the queue if a derivation with the same output already exists #190

Open domenkozar opened 9 years ago

domenkozar commented 9 years ago

Restarting a build with changes like https://github.com/NixOS/nixpkgs/commit/6e726e1348a4ab130b7c531b784b62b6dfa3dbb6 is a noop.

domenkozar commented 9 years ago

Maybe restarting the build should also delete previous result?

lucabrunox commented 9 years ago

I don't think this is an issue, it's expected behavior and how should work. Can we close?

domenkozar commented 9 years ago

It's not an issue because it follows the design, but it's an usability problem.

rbvermaa commented 9 years ago

I think 2 issues are being confused here. A works-like-designed evaluation issue, which has nothing to do with not being able to restart a build that already is built.

rbvermaa commented 9 years ago

Please create a new issue for the latter.

edolstra commented 9 years ago

Well, "works as designed" can still be an issue if the design is wrong :-)

In this case, the design did not take fixed-output derivations into account.

rbvermaa commented 9 years ago

Feel free to reopen it if you think this should be fixed.

rbvermaa commented 9 years ago

Btw, I still don't see why evaluator would create a new build if the result is already there. The restart issue is genuine, that's why a new issue should be created for that.

edolstra commented 9 years ago

It's not about restarting: you need a new derivation to be instantiated if for instance the URL in a fetchurl call changed.

rbvermaa commented 9 years ago

Ah, then we can just add an extra check on drvpath right, in stead of just checking the out path?

edolstra commented 9 years ago

That's the easy fix, but the reason why we don't do that is to prevent the creation of tens of thousands of new builds every time somebody changes the fetchurl mirrors list (for example).

domenkozar commented 9 years ago

Fixed by ixpkgs/commit/13dcb3523309c318e8dce59fae168ea39f98065c?

edolstra commented 9 years ago

No, that's about the difficulty in restarting builds that failed with output. This issue is about builds whose derivation changed but not the output path.

vcunat commented 9 years ago

This really confused me a while ago – seems like a wrong fix to this issue was used.

Hydra.nixos.org re-queued ~30k builds in nixpkgs/trunk even though it has successfully built the same results already a few days ago in nixpkgs/staging. They have equal output store paths, though the derivations differ due to a change in mirror list https://github.com/NixOS/nixpkgs/commit/b80a497b85 (I pushed that one, incidentally). Example pair:

After that it deals with the duplicate builds relatively quickly, on the order of a hundred per minute, but I don't think it should queue them at all.

vcunat commented 9 years ago

Now it even re-queues builds of the same derivations (equal *.drv paths) from a different jobset. Example corresponding to the above: http://hydra.nixos.org/build/23844548#tabs-details Note e.g. that the re-queuing in this case happened ~6 minutes after a build of the same derivation finished successfully.

That seems rather wasteful in the long term, given by us having many duplicities among jobsets (nixpkgs, nixos, etc.).

edolstra commented 9 years ago

Why is it wasteful?

Previously the evaluator had a check for cached builds, but now the queue runner does that.

vcunat commented 9 years ago

Why is it wasteful?

I see, so evaluation is faster now. It was one mirror change and Hydra was clearing this duplicate ~30k builds for about an hour, doing nothing else, and then the some work again for the other jobset (not sure if another hour or faster, derivations being equal now) – that didn't seem efficient for such a powerful set of machines, but maybe the work is just more visible now.

So is the core of work in this case be done by nix-store --realize in the queue-runner, i.e. mainly sqlite queries on the validity of derivation outputs? I expect there's a difference that evaluation-time work did in no way block the real building work, assuming there was something else in the queue and evaluation was done on a separate machine. There might be overhead due to adding into queue also results into DB insertion (not just reading), at least I suppose so from the front-end presentation, but I know too little of hydra workings to be a good judge.

edolstra commented 9 years ago

The new queue runner is multithreaded. There is one thread that processes changes to the queue and another that starts build steps. The first thread is the one that checks whether builds have already been done and if so marks them as cached builds in the database. This does not block the dispatcher thread from starting build steps.

The one downside is that the cached builds in a derivation are no longer marked as finished in a single transaction, as was previously the case. OTOH, it means that the queue runner doesn't have to wait a long time for uncached builds to become visible/buildable.

vcunat commented 9 years ago

Hmm, yes, with this superficial information I know now, it doesn't look wasteful anymore.