Create a release process for the Aptos light client

huitseeker commented 5 months ago

We want to nail down a first MVP of our release tooling for delivery

Rationale

Because our releases come with a claim to performance, we need to nail a complete suite of software tied to the particular light client we are releasing.

That must mean precise versioning, not a tracking branch,
In order to let the cargo resolver know about that notion of precise versioning, given that we are currently tracking branches, we need to edit the .toml files that link our software (Aptos PFN, Sphinx, Aptos LC),
as long as we are doing that, we might as well use Rust crate versioning to our advantage,
once we have released, we want to go back to being able to develop relative to a branch,
a release branch offers the advantage over a release tag that it allows us to make bug-fixes and/or backports to the previously released software.

This points to a release process. We do not have to automate it this round, but considering that we will probably do ~20 releases for this customer, any automation cuts down on future pain.

In detail (worked out example)

Before the Release

LC Repo

dev branch
A---B---C---D---E---F---G (HEAD, dev)
                 \
                  H---I (feature-branch)

Prover (Sphinx) Repo

dev branch
A---B---C---D---E---F---G (HEAD, dev)
                 \
                  H---I (feature-branch)

Aptos PFN Repo

dev branch
A---B---C---D---E---F---G (HEAD, dev)
                 \
                  H---I (feature-branch)

Release Process

Identify commits of LC and corresponding prover:

LC commit: G
Prover commit: G
Aptos PFN commit: G

Tag the commits for our reference:
```
git tag -a v1.2.0 -m "Release v1.2.0" G
```

Create rust version bumps and update dependencies in .toml files:

Update LC's Cargo.toml to:
[dependencies]
sphinx = { version = "1.2.0", git = "https://repo.url/sphinx", tag = "v1.2.0" }
aptos-pfn = { git = "https://repo.url/aptos-pfn", rev = "G" }

Update Prover's Cargo.toml to:

[package]
version = "1.2.0"

Create release branches/tags:
```
git branch release/v1.2.0 G
```
Reset dev branch to new version:

Update version numbers in dev branch to next version (e.g., 1.3.0-pre)

After the Release

LC Repo

dev branch (updated to 1.3.0-pre)
A---B---C---D---E---F---G---J (HEAD, dev)
                 \       \
                  H---I   \
                           K (release/v1.2.0, tagged v1.2.0)

Prover (Sphinx) Repo

dev branch (updated to 1.3.0-pre)
A---B---C---D---E---F---G---J (HEAD, dev)
                 \       \
                  H---I   \
                           K (release/v1.2.0, tagged v1.2.0)

Aptos PFN Repo

dev branch (unchanged)
A---B---C---D---E---F---G---J (HEAD, dev)
                 \       \
                  H---I   \
                           K (tagged v1.2.0)

tchataigner commented 5 months ago

@samuelburnham and myself have tackled the Aptos PFN part for this issue. A release/aptos-light-client branch and a dev branch have been created, along with a workflow file that will pull relevant tag from the Aptos repository and open a PR on a successful rebase, or open an issue on failure.

tchataigner commented 5 months ago

After some discussions, I will change the workflow file to only create an issue on the Aptos PFN repository, have a manual rebase on dev and then a release workflow that removes the out-of-date release branch to create a newer one.

tchataigner commented 5 months ago

I finalized the updates on lurk-lab/aptos-core. Here is the current workflow for release and bugfix:

Release

aptos-light-client-tag-comparison.yml: Cron job running everyday at midnight to warn us when there is a relevant Mainnet tag to rebase on.
Rebase the dev branch on the new tag
aptos-light-client-patch-release-tag.yml: Manual workflow to officialize a release. It takes one input, the name of the tag that was patched. It will create a release branch in the shape `release/-patched.

This allows us to keep older release around and still know which release branch is the latest based on the Aptos release tag embedded in the branch name.

Bugfix There is no real automation to be done for a Bugfix, as the process will most likely necessitate some manual intervention for conflict resolution

Detect a bug on the release/aptos-nodev1.14.0-patched branch

Patch release/*

Checkout from the release/aptos-nodev1.14.0-patched branch a new hotfix/aptos-nodev1.14.0-patched, commit and push necessary fixes
Open a PR from hotfix/aptos-nodev1.14.0-patched -> release/aptos-nodev1.14.0-patched
Iterate until ready, squash merge

Propagate patch to dev

Checkout from the dev branch a hotfix/dev branch
Cherry-pick the squashed commit from the release/aptos-nodev1.14.0-patched branch to the hotfix/dev, solve conflict if any
Open a PR from hotfix-dev -> dev, iterate and squash merge.

I could not think of a better way of handling those without leveraging git merge. Please let me know if there is any improvement you deem necessary.

huitseeker commented 5 months ago

All of this sounds great, with a few details:

let's assume we're working on Aptos PFN v1.12, which corresponds to our branch v1.12-patched,
let's assume we see release of Aptos PFN v.1.13, rebase our patches, and create branch v1.13-patched,
if in any way we "merge" or "remove the out-of-date release branch", we are invalidating the prior release of the light client, which cannot build correctly any more (as it depended on the Aptos PFN v1.12 patched).

IOW: we must absolutely ensure correct building of prior released versions, and as a dependency thereof our patched Aptos PFN must have historical branches live forever.

One way to do this is to make sure that while for development, dev branches depend on other dev branches, any release does not contain any sort of a dev branch anywhere (transitively): those release branches should be named after the release itself.

wwared commented 5 months ago

Another thing, we might want to disable the automatic program rebuilding in build.rs for the release builds (or do something equivalent).

Thought being, if those are enabled, then because we're not storing the Cargo.lock files for the LC programs in the programs/ folder, every time the build runs it will fetch the latest version of all dependencies including sphinx and bls12_381, which won't have branches pinned to the LC releases. This would happen when building any part of the LC, replacing the programs in the artifacts directory and technically invalidating the release.

We might want to solve this in some other way (I'm very open to suggestions), but what I'm thinking is that the issues are:

We do not want the programs to be accidentally re-built in a released tarball: the RISC-V binaries in the tarball should be used as-is, and the dependencies should be pinned
It should be possible to recreate the specific binary from the time of release at a later date. This might require us to ship Cargo.lock files or do some other kind of version pinning in the Cargo.toml files to ensure the right version of everything is used when building in a release branch, for both the RISC-V programs and the LC components

I'm not really sure of the best solution, but my initial thought is:

Disable build.rs by default unless the code is in a git repository and the branch is not a release branch (basic string comparison?), or some env var is set, or just always explicitly/manually disable it in the release tarballs/branches
Add Cargo.lock files to all the crates in the repo (and ensure that any build instructions/scripts pass --locked where necessary)

huitseeker commented 5 months ago

I think the best approach is to check in the Cargo.lock files.

storojs72 commented 5 months ago

There is an upcoming PR (https://github.com/lurk-lab/zk-light-clients/pull/28) that adds demonstration of LC programs Solidity verification using Plonk contracts from sphinx. SP1 has separate repository for contracts distribution and releasing, but in our case I think we can for simplicity keep sphinx contracts as part of zk-light-client repository. The fixture-generator and contracts-generator programs use sphinx dependencies from the workspace, so they will be updated in context of releasing the zk-light-client.

Once https://github.com/lurk-lab/zk-light-clients/pull/28 is merged, the release process will require some extension. More specifically:

Build new Plonk parameters using sphinx;
Create tar.gz archive and upload it to AWS bucket (s3://sphinx-plonk-params) using specific tag from sphinx (link);
Check that new parameters are accessible (via contracts-generator program).

tchataigner commented 5 months ago

Release process works as we now have our first release, created through #54

argumentcomputer / zk-light-clients