MinaProtocol / mina

Mina is a cryptocurrency protocol with a constant size blockchain, improving scaling while maintaining decentralization and security.
https://minaprotocol.com
Apache License 2.0
1.98k stars 525 forks source link

[nix-EPIC] Nix-Based CI #10941

Closed balsoft closed 1 week ago

balsoft commented 2 years ago

Nix-based build, test and deployment system for Mina.

yorickvP commented 2 years ago

Here are my preliminary notes on our desired CI system, @mkaito , could you massage this into some human readable text?

Goals

Functional requirements

Artifacts

Substrate

CI runs on 180 buildkite agents on GKE, split over 215 c2-standard-16 VMs on google cloud. Autoscaling is enabled but not working properly. There are two build queues, default and integration. The integration tests run on beefier machines.

CI jobs

CI runs a lot of builds, mostly using different docker images. Deployment jobs are problematic because of the pre-emptions.

Scaling

Coherence is maintained with shared artifact and deployment targets.

Properties

Security

Currently, all builds can access all credentials, and the credentials are very powerful. This is mitigated with the 'ci-build-me' label on github, which is neccesary to run the CI. However, token leaks are still an issue.

Differential jobs

Some tests, like "merges cleanly into develop", depend on the current state of 'unrelated' branches. This is a problem for CI predictability and caching.

Proposed

Repo

Require commit signatures. Disallow PRs into anything but develop/compatible/master/release.

CI Substrate

Minimum viable CI runs 4-6 buildkite nodes on a non-preemptible c2-standard-30 (30 cores, 120GB memory) VM. We can expand this later. There should be 2 build queues, a default one and a deployment one. The deployment build queue should have a single agent (to prevent interference) and security credentials to do deployments.

CI Jobs

Mostly nix builds, which are deterministic and offer caching. We can also create docker images this way. Deployment jobs should run on a different build queue with a single agent and security credentials, so they don't interfere. Differential jobs (i.e. branching constraints) should also specify a clear dependency and only depend on the current state of the branch. (corresponding develop/compatible revs should be found using git-merge-base, etc.)

To make it fail faster, don't run integration tests if anything else fails

Scaling

We can use a shared nix cache, but scaling beyond a single machine may involve build orchestration by transforming the nix dependency graphs into a buildkite pipeline on-demand. https://github.com/serokell/common-infra

Properties

robinbb commented 2 years ago

@yorickvP It's great to have these notes here to prompt some thought by others. Thank you. I propose that the next action is to gather as a team (those who would work on things related to Nix CI) and construct a set of GitHub issues that are a breakdown of the work into efforts that would sum to give the vision that you collectively develop for this (Epic) work.

robinbb commented 1 year ago

The vision shared in this issue, @michal0mina , need not necessarily be shared by you. It may be that this issue is not necessary to complete the "Nix-Enabled CI" project. At your discretion.