Open roberth opened 1 year ago
This sounds good. I have been repeatedly bitten by GitHub Actions' poor coverage and they are too slow anyways. I would also like to add more cross-version daemon and ssh://
integration testing, and as discussed many months ago that was blocked by github actions capacity limits.
We have had a number of instances of things being broken and that found after the fact. This is very stressful! (At least I feel under pressure when it turns out I broke something by accident.)
It would be much nicer if PRs were auto-merged only after all jobs passed, so yes please let's do this.
I don't want to make the Nix CI / release process dependent on yet another CI system. We already have GitHub actions and Hydra. Ideally we would build PRs on Hydra (which also has the advantage that it's not a proprietary tool/infrastructure).
I don't care which CI it is, as long as we build PRs complete, so I would be fine if hydra.nixos.org
did it. I would say we can even drop GitHub actions at that point.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2023-02-10-nix-team-meeting-minutes-31/25438/1
https://github.com/NixOS/nix/pull/7748#issuecomment-1438420716 More issues caught by Hydra but not GitHub Actions. We really need completely PR CI.
Discussed in the Nix team meeting:
aarch64-linux
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2023-03-27-nix-team-meeting-minutes-44/26759/1
I need to make some adjustments to smoothen the transition, but we can get the repo admin operations out of the way and start using merge queues.
Start using merge queues
A repo admin follows the steps in this paragraph https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#managing-a-merge-queue
For the required statuses, we can just use the actions for now.
Install Hercules CI
But not enable it yet.
@roberth tasks
When the admin tasks are done, my goal is to enable it without causing any interference and address anything that comes up with high priority.
Meanwhile I have the following tasks
Known performance issue:
The first build may be slower than subsequent builds. This is something I plan to improve in the coming weeks.
I think adding Hercules should have more justification than "Hercules is run by Robert, and Hydra is run by Graham". Especially since, as I understand it, Hercules is proprietary. If we want something changed in Hercules, we're entirely dependent on Robert to do that in a way that we're not with Hydra — anybody can do the implementation work, and in the worst case scenario anybody can run Hydra.
Edit: to expand further after some more research (infra team correct me if I got anything wrong), Hydra has an infrastructure team behind it. These are the people with full access (and there are more with partial access). These are the people who can commit to the Hydra repo.
@alyssais This isn't proposing a policy; just adding a service, that can be complemented or replaced by something better in a few clicks. There's no buy-in like with Jenkins vs Travis before; it's all just Nix underneath. It's up to the team to decide which tools are useful at any given time, and my "vote" carries no weight on this. You're free to help out in any way you want, and I hope you spend your time well, regardless of what you choose to do.
It's up to the team to decide which tools are useful at any given time, and my "vote" carries no weight on this.
I agree, I just want to help the team come to an informed decision, as the minutes of the meeting suggested that the team might be information that I understand to be inaccurate. (Specifically, that Hydra could mean being blocked waiting for Graham.)
@alyssais yes, that log line is a bit of a simplification of the overall discussion (although I think I quite literally said it at some point as a tongue-in-cheek summary).
The thing is that getting Hydra to do this would be a bit of work (because we'd have to spawn a dedicated instance), and I don't see anyone ready to do it. I talked about this in the past (in #Hydra IIRC), and although people liked the idea, it looks like no one had the time to do anything about it. On the other hand, we have @roberth here who's offering to experiment with Hercules and take the infra part on himself. This is why we're moving in that direction at the moment.
The thing is that getting Hydra to do this would be a bit of work (because we'd have to spawn a dedicated instance), and I don't see anyone ready to do it. I talked about this in the past (in #Hydra IIRC), and although people liked the idea, it looks like no one had the time to do anything about it. On the other hand, we have @roberth here who's offering to experiment with Hercules and take the infra part on himself. This is why we're moving in that direction at the moment.
Thanks for the background — with that perspective it makes a lot more sense indeed than what the meeting notes said.
we'd have to spawn a dedicated instance
Wait, I missed this — why?
@alyssais Basically concerns about unreviewed PRs getting CI runs. (c.f. ofborg vs hydra.nixos.org.)
In both cases, I would like the solve the problem of having separate build farms without resorting to completely separate technology, but that will also take a bunch of volunteer effort that hasn't yet materialized.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2023-05-05-nix-team-meeting-minutes-52/27893/1
Bors is now deprecated. (although I'm guessing using the GitHub feature instead of Bors wouldn't change much about this.)
For the mean time, could we just throw some money at the problem and upgrade to larger github runners?
Even nixos VM tests can be enabled. They got KVM support recently.
It would be great if this issue could be prioritized. Quick high quality feedback stands at the core of an efficient development workflow and right now the feedback is neither complete nor quick.
I use mergify to replace bors i.e. in nixos-hardware: https://github.com/NixOS/nixos-hardware/pull/820
@zowoq also added that github's merge queues can now also be enabled for public organizations.
I hope this isn't stepping on anyone's toes, but I kind of accidentally ran garnix on a commit on my fork of this repo (I have it enabled globally) and without further configuration already half the outputs worked (and many of the failures seem related to ccache). The load on our system was pretty minimal - we can definitely support enabling it in this repo if that'd be helpful.
(Note: I started garnix - not sure if that presents a conflict of interests here.)
@jkarni that would be great! We recently discussed this on the team and would gladly run multiple setups side by side to see what works best. @Ericson2314 seemed to be particularly interested.
I kind of accidentally
Let's be honest, of course a small voice in me wants your competing product not to be here, and probably you've had a small voice tell you that this could be helpful for yours. That's fine; of course it's mutually beneficial between Nix and our respective products.
As you can probably tell from the age of this issue I haven't been able to push forward on the few blockers that would let Hercules CI do the whole thing, as its focus has so far been on CI/CD for trustworthy contributions, and not PRs. (Specifically because the HCI architecture has one set of machines and one cache per organization.) I suppose that's where garnix could help out, by building PRs, including those from forks?
I also want to throw my horse into the CI race here :)
I also used both hercules-ci and garnix before and they work great. I wanted to support some more architectures using my own hardware and more importantly be able to test pull requests (limitation of hercules-ci), which is why I started buildbot-nix instead. buildbot-nix is an opensource library/configuration on top of buildbot. Buildbot is a mature 20+ year old CI framework used in many big opensource projects: Python,[6] WebKit,[7] LLVM,[8] Blender,[9] ReactOS,[10
Buildbot-nix extends buildbot to build nix projects with zero configuration. The project itself is small (3K lines). It currently supports native GitHub/Gitea integration. You can see what this looks like here on my nix fork: https://github.com/Mic92/nix-1/pull/2
If you want to test buildbot, I already want have the github app installed into nixos-wiki-infra. All it needs is setting the build-with-buildbot
github topic and pinging me on matrix, so I start the project re-scan.
Of course I can understand if you prefer hercules-ci as this is what Robert develops, especially if it helps Robert to justify his countless hours of work on Nix.
Will hercules-ci supports windows/*BSD in the future btw? This would be nice to test our upcoming platforms....
I very much dislike GitHub actions :) if we need to test Nix on "vanilla distros" (e.g. test installers) we should use non-NixOS VM tests.
Not sure about windows but cloud-config works well with the nixos test framework. In system-manager we use it with ubuntu: https://github.com/numtide/system-manager/blob/4d33bfa43cc067dd6ea6a0e41115f27b50cd5a35/test/nix/modules/default.nix#L123 https://github.com/numtide/nix-vm-test
Is your feature request related to a problem? Please describe.
Prevent issues like #7669 by building the whole flake before merge.
Describe the solution you'd like
Maybe Theophane was right. I could add a bors.toml after an admin installs Hercules CI and bors on this repo. Instead of "enable auto-merge", we'd comment
bors r+
to indicate a positive review and let bors merge it as appropriate. This way we build a larger set and we get the guarantee that the merge builds. (informally: the no rocket science rule) Bors also allows the delegation of review (merge) rights to contributors, and it does not suffer from the GitHub Actions limitation on builds triggered by actions.Describe alternatives you've considered
Extend the actions, but they seem slow. Doesn't help with NixOS tests either. I don't know a ton about actions, whereas the above is easy and you'll get amazing support.
Additional context Add any other context or screenshots about the feature request here.
Priorities
Add :+1: to issues you find important.