NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.94k stars 1.53k forks source link

Test everything before merging with merge queue + hercules-ci #7674

Open roberth opened 1 year ago

roberth commented 1 year ago

Is your feature request related to a problem? Please describe.

Prevent issues like #7669 by building the whole flake before merge.

Describe the solution you'd like

Maybe Theophane was right. I could add a bors.toml after an admin installs Hercules CI and bors on this repo. Instead of "enable auto-merge", we'd comment bors r+ to indicate a positive review and let bors merge it as appropriate. This way we build a larger set and we get the guarantee that the merge builds. (informally: the no rocket science rule) Bors also allows the delegation of review (merge) rights to contributors, and it does not suffer from the GitHub Actions limitation on builds triggered by actions.

Describe alternatives you've considered

Extend the actions, but they seem slow. Doesn't help with NixOS tests either. I don't know a ton about actions, whereas the above is easy and you'll get amazing support.

Additional context Add any other context or screenshots about the feature request here.

Priorities

Add :+1: to issues you find important.

Ericson2314 commented 1 year ago

This sounds good. I have been repeatedly bitten by GitHub Actions' poor coverage and they are too slow anyways. I would also like to add more cross-version daemon and ssh:// integration testing, and as discussed many months ago that was blocked by github actions capacity limits.

Ericson2314 commented 1 year ago

We have had a number of instances of things being broken and that found after the fact. This is very stressful! (At least I feel under pressure when it turns out I broke something by accident.)

It would be much nicer if PRs were auto-merged only after all jobs passed, so yes please let's do this.

edolstra commented 1 year ago

I don't want to make the Nix CI / release process dependent on yet another CI system. We already have GitHub actions and Hydra. Ideally we would build PRs on Hydra (which also has the advantage that it's not a proprietary tool/infrastructure).

Ericson2314 commented 1 year ago

I don't care which CI it is, as long as we build PRs complete, so I would be fine if hydra.nixos.org did it. I would say we can even drop GitHub actions at that point.

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-02-10-nix-team-meeting-minutes-31/25438/1

Ericson2314 commented 1 year ago

https://github.com/NixOS/nix/pull/7748#issuecomment-1438420716 More issues caught by Hydra but not GitHub Actions. We really need completely PR CI.

fricklerhandwerk commented 1 year ago

Discussed in the Nix team meeting:

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-03-27-nix-team-meeting-minutes-44/26759/1

roberth commented 1 year ago

I need to make some adjustments to smoothen the transition, but we can get the repo admin operations out of the way and start using merge queues.

Start using merge queues

  1. A repo admin follows the steps in this paragraph https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#managing-a-merge-queue

  2. For the required statuses, we can just use the actions for now.

Install Hercules CI

But not enable it yet.

  1. A repo admin installs Hercules CI on the repo: https://github.com/apps/hercules-ci/installations/new
  2. Anyone with write access disables the project in https://hercules-ci.com/github/NixOS/nix, because configuration is necessary for it to succeed. (right under Settings)

@roberth tasks

When the admin tasks are done, my goal is to enable it without causing any interference and address anything that comes up with high priority.

Meanwhile I have the following tasks

Known performance issue:

The first build may be slower than subsequent builds. This is something I plan to improve in the coming weeks.

alyssais commented 1 year ago

I think adding Hercules should have more justification than "Hercules is run by Robert, and Hydra is run by Graham". Especially since, as I understand it, Hercules is proprietary. If we want something changed in Hercules, we're entirely dependent on Robert to do that in a way that we're not with Hydra — anybody can do the implementation work, and in the worst case scenario anybody can run Hydra.

Edit: to expand further after some more research (infra team correct me if I got anything wrong), Hydra has an infrastructure team behind it. These are the people with full access (and there are more with partial access). These are the people who can commit to the Hydra repo.

roberth commented 1 year ago

@alyssais This isn't proposing a policy; just adding a service, that can be complemented or replaced by something better in a few clicks. There's no buy-in like with Jenkins vs Travis before; it's all just Nix underneath. It's up to the team to decide which tools are useful at any given time, and my "vote" carries no weight on this. You're free to help out in any way you want, and I hope you spend your time well, regardless of what you choose to do.

alyssais commented 1 year ago

It's up to the team to decide which tools are useful at any given time, and my "vote" carries no weight on this.

I agree, I just want to help the team come to an informed decision, as the minutes of the meeting suggested that the team might be information that I understand to be inaccurate. (Specifically, that Hydra could mean being blocked waiting for Graham.)

thufschmitt commented 1 year ago

@alyssais yes, that log line is a bit of a simplification of the overall discussion (although I think I quite literally said it at some point as a tongue-in-cheek summary).

The thing is that getting Hydra to do this would be a bit of work (because we'd have to spawn a dedicated instance), and I don't see anyone ready to do it. I talked about this in the past (in #Hydra IIRC), and although people liked the idea, it looks like no one had the time to do anything about it. On the other hand, we have @roberth here who's offering to experiment with Hercules and take the infra part on himself. This is why we're moving in that direction at the moment.

alyssais commented 1 year ago

The thing is that getting Hydra to do this would be a bit of work (because we'd have to spawn a dedicated instance), and I don't see anyone ready to do it. I talked about this in the past (in #Hydra IIRC), and although people liked the idea, it looks like no one had the time to do anything about it. On the other hand, we have @roberth here who's offering to experiment with Hercules and take the infra part on himself. This is why we're moving in that direction at the moment.

Thanks for the background — with that perspective it makes a lot more sense indeed than what the meeting notes said.

alyssais commented 1 year ago

we'd have to spawn a dedicated instance

Wait, I missed this — why?

Ericson2314 commented 1 year ago

@alyssais Basically concerns about unreviewed PRs getting CI runs. (c.f. ofborg vs hydra.nixos.org.)

Ericson2314 commented 1 year ago

In both cases, I would like the solve the problem of having separate build farms without resorting to completely separate technology, but that will also take a bunch of volunteer effort that hasn't yet materialized.

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-05-05-nix-team-meeting-minutes-52/27893/1

alyssais commented 1 year ago

Bors is now deprecated. (although I'm guessing using the GitHub feature instead of Bors wouldn't change much about this.)

DavHau commented 11 months ago

For the mean time, could we just throw some money at the problem and upgrade to larger github runners?

Even nixos VM tests can be enabled. They got KVM support recently.

It would be great if this issue could be prioritized. Quick high quality feedback stands at the core of an efficient development workflow and right now the feedback is neither complete nor quick.

Mic92 commented 11 months ago

I use mergify to replace bors i.e. in nixos-hardware: https://github.com/NixOS/nixos-hardware/pull/820

@zowoq also added that github's merge queues can now also be enabled for public organizations.

jkarni commented 4 months ago

I hope this isn't stepping on anyone's toes, but I kind of accidentally ran garnix on a commit on my fork of this repo (I have it enabled globally) and without further configuration already half the outputs worked (and many of the failures seem related to ccache). The load on our system was pretty minimal - we can definitely support enabling it in this repo if that'd be helpful.

(Note: I started garnix - not sure if that presents a conflict of interests here.)

fricklerhandwerk commented 4 months ago

@jkarni that would be great! We recently discussed this on the team and would gladly run multiple setups side by side to see what works best. @Ericson2314 seemed to be particularly interested.

roberth commented 4 months ago

I kind of accidentally

Let's be honest, of course a small voice in me wants your competing product not to be here, and probably you've had a small voice tell you that this could be helpful for yours. That's fine; of course it's mutually beneficial between Nix and our respective products.

As you can probably tell from the age of this issue I haven't been able to push forward on the few blockers that would let Hercules CI do the whole thing, as its focus has so far been on CI/CD for trustworthy contributions, and not PRs. (Specifically because the HCI architecture has one set of machines and one cache per organization.) I suppose that's where garnix could help out, by building PRs, including those from forks?

Mic92 commented 4 months ago

I also want to throw my horse into the CI race here :)

I also used both hercules-ci and garnix before and they work great. I wanted to support some more architectures using my own hardware and more importantly be able to test pull requests (limitation of hercules-ci), which is why I started buildbot-nix instead. buildbot-nix is an opensource library/configuration on top of buildbot. Buildbot is a mature 20+ year old CI framework used in many big opensource projects: Python,[6] WebKit,[7] LLVM,[8] Blender,[9] ReactOS,[10

Buildbot-nix extends buildbot to build nix projects with zero configuration. The project itself is small (3K lines). It currently supports native GitHub/Gitea integration. You can see what this looks like here on my nix fork: https://github.com/Mic92/nix-1/pull/2

If you want to test buildbot, I already want have the github app installed into nixos-wiki-infra. All it needs is setting the build-with-buildbot github topic and pinging me on matrix, so I start the project re-scan.

Of course I can understand if you prefer hercules-ci as this is what Robert develops, especially if it helps Robert to justify his countless hours of work on Nix.

Mic92 commented 4 months ago

Will hercules-ci supports windows/*BSD in the future btw? This would be nice to test our upcoming platforms....

Ericson2314 commented 4 months ago

I very much dislike GitHub actions :) if we need to test Nix on "vanilla distros" (e.g. test installers) we should use non-NixOS VM tests.

Mic92 commented 4 months ago

Not sure about windows but cloud-config works well with the nixos test framework. In system-manager we use it with ubuntu: https://github.com/numtide/system-manager/blob/4d33bfa43cc067dd6ea6a0e41115f27b50cd5a35/test/nix/modules/default.nix#L123 https://github.com/numtide/nix-vm-test