frequency-chain / frequency

Frequency: A Polkadot Parachain
https://www.frequency.xyz
Apache License 2.0
51 stars 19 forks source link

Re-architect CI Workflow including Release Process #282

Closed shannonwells closed 2 years ago

shannonwells commented 2 years ago

We need to document how we are doing releases. Both the process of doing it, and the documentation that accompanies each release.

Ideas

Things the Release Should Have/Contain

Maybe in the future we will do an arm64 binary, but the weights did not come out good enough. #367

Notes:

Per @saraswatpuneet:

Per @wilwade:

Notes on the Baseline Process as of 10/11/22:

  1. Releases aren't atomic as partial artifacts package can be published. Do they need to be atomic (all or nothing)? If not, we need a process for rolling back a failed release.
  2. The current workflow does not support multiple release candidate branches, it assumes that releases will be cut by tagging commits in main. Should we start working on multiple release branch Git workflow next?
  3. Whatever we annotate the release tag with is what currently gets displayed on the GitHub release page. See example here. Consider generating release notes file as part of release workflow and using that for content on the GItHub release page.
  4. What exactly would we like to borrow from Polkadot Releases release notes?
  5. Consider switching to Fast-Forward only PR merging policy in GitHub, so the same commit we have verified in the PR ends up on main branch after merge. Developers will need to rebase their PRs onto origin/main before a PR can be merged.
  6. We currently publish binaries for amd64 and arch64 platforms, but the docker images are amd64 only. Not sure if this is an issue.
  7. JS API Augment is versioned as currently latest: 0.1.3 and next: 0.0.0-[short-sha]. Do we want to change the latter?
  8. Rust Dev Docs don't get properly versioned.
  9. The NPM release version is not recorded in the package.json
  10. A release with the same version tag can only be executed successfully once. NPM doesn't allow publishing under the same release version more than once.

Tasks:

demisx commented 2 years ago

I have set up various release processes with different git branch strategies in the past in (GitHub flow, GitLab flow, trunk-based development, etc.) based on Commitizen and Conventional Commits . May be helpful here.

demisx commented 2 years ago

Here is my approach here at high level. Please let me know if I am missing anything:

  1. I would like to slightly restructure our current GitHub workflows. There are some questionable jobs and triggers (ex. why are we publishing images on merge to main but not when the main is tagged, why so many ifs, etc.), so I'd like to map it all out in a diagram first and offer some improvements.
  2. Afterwards, I am going to review Polkadot release process closer and figure out how much of it we can borrow and what are the parts we want to do differently, if any.
  3. Figure out how runtime upgrades are different from other code changes
  4. Create new release process flow
  5. Get team feedback and update
  6. Implement the new release process
demisx commented 2 years ago

I am removing "documentation" label because this story is not going to be documentation only. I will be proposing a new release process with some updated GitHub workflows.

demisx commented 2 years ago

@wilwade @saraswatpuneet Here is the diagram depicting the current status quo in the Frequency CI. I have intentionally skipped some minor steps (like cache action) for brevity. Please take a look and let me know if you see anything off.

Working on the new CI, I eventually need to know the following:

  1. Is there anything in the current process that we do not want to carry over to the new one?
  2. Is there anything in the current process that needs to be done differently?
  3. What OSs and Rust Toolchains we need to support? Currently I only see ubuntu-latest and nightly-2022-07-23
wilwade commented 2 years ago

@demisx

Is there anything in the current process that we do not want to carry over to the new one?

Is there anything in the current process that needs to be done differently?

  1. Something we are not currently doing, but we should also upload the WASM to IPFS. Likely should pin as well, but not sure what we will use to do that.
  2. Context: We should switch to a better versioning system for releases and Docker images.
  3. Context: We will eventually have some js integration tests to run: #371
  4. Context: The JS API Augment should be re-written to do all of the code generation at build time (which will require using docker instant (and thus making sure it can retrieve the most recent version tag or running instant or something). Should also align version numbers somehow with Frequency releases. Hmm... Likely it could be folded into the Frequency Releases then...: #382
wilwade commented 2 years ago

@demisx

What OSs and Rust Toolchains we need to support? Currently I only see ubuntu-latest and nightly-2022-07-23

wilwade commented 2 years ago

@demisx One other thing.

We should move mdbook deployments to the release tagging instead of main likely. Or deploy to two different locations.

demisx commented 2 years ago

@wilwade Thank you for the feedback. I'm keeping the wishlist in mind and there will be places in CI workflow for the new jobs to be plugged into.

In general, here are the high level principles I am sticking to:

  1. All necessary vetting (checks, lints, etc) needs to happen when a commit is pushed to a PR branch. If it passes, then the build is mergable and deployable anywhere. If we find issues with the build in later stages, then the checks must be adjusted here, not after something has already been merged to main.
  2. The build will be done via matrix, so additional variants can be introduced later without major code changes.
  3. The release version will follow semantic versioning
  4. The release version will be auto-generated from the developer commit messages
  5. The commit messages will follow the conventional commits spec, perhaps using Angular commits format as a starting point
  6. All produced CI artifacts will be version tagged at once for consistency
  7. All artifacts will be built once and then moved through environments unchanged via tags/names (this works nicely with docker images, gotta find a similar way to do the same with the binaries)

I will share the new diagram for feedback once it's ready.

demisx commented 2 years ago

One other thing. We should move mdbook deployments to the release tagging instead of main likely. Or deploy to two different locations.

@wilwade ❓ Can you please clarify what you mean here? I am not seeing any mdbook deployment in the current workflows. Unless I'm missing something.

demisx commented 2 years ago

Here is the proposed CI workflow for "commit pushed to PR branch" event. The goal is to run all necessary checks required to vet the changes. In other words, if this workflow succeeds, then we should be 100% confident that the code can be merged to main. Please let me know if I missed anything or if you have any questions.

https://docs.google.com/drawings/d/1IcAzZbulRcntYQ5ub-zDyIkqJY1NE0iqClhkg2PE5lo/edit

wilwade commented 2 years ago

One other thing. We should move mdbook deployments to the release tagging instead of main likely. Or deploy to two different locations.

@wilwade ❓ Can you please clarify what you mean here? I am not seeing any mdbook deployment in the current workflows. Unless I'm missing something.

That's on me. The Frequency generated documentation not mdbook. Sorry about that.

demisx commented 2 years ago

@wilwade Oh, I see. Yes, all artifacts will be published an uploaded at once, meaning the Frequency docs will be published during release phase.

wilwade commented 2 years ago

Re: proposed CI workflow for "commit pushed to PR branch" event

demisx commented 2 years ago

@wilwade I've updated the diagram with appropriate notes per your feedback. As far as #376 goes, I've reserved a job for it for future implementation. I do not know right now what are the steps until there are examples of running runtime checks.

Moving forward with the next workflow.

demisx commented 2 years ago

@wilwade @saraswatpuneet So far, I have identified the following artifacts:

  1. Frequency Docs for GitHub Pages
  2. Local collator docker image in instant seal mode
  3. Local collator docker image for local relay chain (aka rococo-local)
  4. Binary for Rococo
  5. WASM for Rococo
  6. Deterministic WASM for Rococo
  7. Parachain docker image for Rococo (full node)
  8. Binary for Mainnet
  9. WASM for Mainnet
  10. Deterministic WASM for Mainnet
  11. Parachain docker image for Mainnet (full node)
  12. WASM for IPFS (Coming soon...)
  13. JS API Augment NPM Package published on NPM as @next
  14. JS API Augment NPM Package published on NPM as latest

There are different options on how we can release:

  1. Anything merged to main is auto-released (similar to trunk based development)
  2. Introduce a dedicated branch (i.e. 'release' or 'prod') and whatever is merged to this branch is released
  3. Introduce a special tag that would trigger a release (do not confuse this tag with the auto-generated release version tag). This is useful if we want to cut releases manually from certain commits on main.

A couple of questions:

  1. Do you have any strong preference which approach to use from the options above?
  2. If not option 1, is there anything in particular we want to release when a PR is successfully merged to main. I can't think of anything except maybe "JS API Augment NPM Package published on NPM as @next" since the @next represents an upcoming version?
saraswatpuneet commented 2 years ago

@demisx I think option 2 is followed by many projects in this space

demisx commented 2 years ago

@saraswatpuneet Agreed. That's the most common option especially for new teams. Thank you for your feedback. Let's see if @wilwade has any other preference.

saraswatpuneet commented 2 years ago
demisx commented 2 years ago

Here is a short comparison on release handling between these networks:

Network Release Branch Name Strategy Release Tag Ties to Polkadot or Substrate Version
Polkadot release-v[0-9]+.[0-9]+.[0-9]+* Separate branch per each release. A release is published after each commit to each branch v[0-9]+.[0-9]+.[0-9]+* Polkadot
Acala release-acala-[0-9]+.[0-9]+.[0-9]+ Separate branch per each release. A release is triggered by manually running "Publish Release" GitHub workflow. [0-9]+.[0-9]+.[0-9]+ No
Moonbeam None Binary and runtime are released separately. Releases are triggered by manually running "Publish Binary" or "Publish Runtime" GitHub workflows. v[0-9]+.[0-9]+.[0-9] v[0-9]+.[0-9]+.[0-9]+
runtime-[0-9]+
No

So, assuming we also create a separate release branch and tie it to the Polkadot version with modifier (i.e. release-v0.9.27-1), the next question would be – do we want to ❓ :

  1. Cut a new release with each commit to the release branch; OR
  2. Allow authorized team members to cut a new release by manually running a designated GitHub action (something like "Publish New Release")
saraswatpuneet commented 2 years ago

Ideally we should cut a release branch with tag and a release once we are comfortable with changes with current polkadot version and expect no new releases for the same branch

Reason: we are in decentralized world we can't dictate all our possible collators everytime we make a huge change to already released version

demisx commented 2 years ago

Based on my earlier Slack conversation with @saraswatpuneet and @wilwade, so far the release process is expected to be as such (high level). This most likely will change, but is a good starting point:

  1. Polkadot releases a new version vX.X.X
  2. We upgrade to the new version Polkadot version vX.X.X on main
  3. We create a new release branch (long-lived) called release-vX.X.X-1 from main. Perhaps this can be viewed as release candidate at this point
  4. Commit can be cherry picked from main and merged to the release branch
  5. At some point, when we ready to release, we tag the latest commit on release-vX.X.X-1 branch with release tag vX.X.X-1 and push
  6. GitHub Actions kick in and publish new Frequency release with vX.X.X-1 in the release name.
  7. The release branch release-vX.X.X-1 is locked, meaning once released, there will be no more changes on this release branch
  8. If we need to release for the same Polkadot version again, we create a new release-vX.X.X-2 branch (with incremented modifier 2) and the process repeats from step 4

Something to think about:

demisx commented 2 years ago

Here is the first iteration (V2.0) of the new CI workflows. There are 3 triggering events:

  1. Commit push to PR branch
  2. PR merge
  3. Release tag push to remote

Please review and let me know if you have any feedback. Especially item 2 and item 3 as they are brand new additions.

https://docs.google.com/drawings/d/1IcAzZbulRcntYQ5ub-zDyIkqJY1NE0iqClhkg2PE5lo

wilwade commented 2 years ago

@demisx A few notes to think about, but I think this is 100% shipable.

  1. Just to be clear, this assumes that we are generating benchmarks in each PR that needs it before merge. I can imagine a different flow that runs benchmarks in the release branch, but I'd like to avoid that as it hides the weight changes that occur in each PR.
  2. We don't have the push wasm to IPFS, but I don't feel we have to do that.
  3. Merge PR and Release JS API Augment Release candidate flows (once we switch to a more automated flow) will also need frequency running in instant seal.
  4. Was thinking about sanity checks. I wonder if it would be worth running to the point of being able to peer with mainnet. That should happen quickly, but would verify not only that it could run, but also that it peers. That said it is a much larger sanity check and something perhaps we need to do manually via a release checklist (which I am sure we will also have)
demisx commented 2 years ago

@wilwade Thank you for your feedback. Highly appreciate it. 🙏🏻 Sorry, I should've emphasized that the first iteration of redesign is focused mainly on porting existing functionality to the new process. I want to get us back to status quo first with the new CI, resolve issues and polish things out. Once we feel comfortable the new CI runs smoothly, we are ready to start adding new features. Of course, if there is something of a high priority that needs to be added right away we'll make this happen.

As far as #4 goes, the sanity checks can be anything that helps us efficiently and promptly catch any code issues. I am sure we'd be adding a bunch along the way as we learn new lessons. I am also for automating peering and keeping that manual checklist as short as possible. Please feel free to create a new story and assign it to me. Or I can do it myself with the details you gave me above. Whatever helps.

wilwade commented 2 years ago

@demisx Sounds great! Let's make it happen :)

demisx commented 2 years ago

I am working on implementing the first "Push PR Commit" trigger. I'll keep an eye on this issue if additional feedback is provided. I want to introduce the following naming convention that helps us quickly map any given workflow file to the corresponding process in the diagram. Since GitHub workflows don't support nested folders yet, I'd have to rely solely on the file name:

<trigger-event>.<workflow-name>.yml

Examples:

demisx commented 2 years ago

What was the justification for including this step in the current CI workflows? Trying to determine if we really need to carry it over to the new CI workflows.

      - name: Free space on Ubuntu
        if: ${{ matrix.os }} == 'ubuntu-latest'
        run: |
          echo "Pre cleanup"
          df -h
          sudo rm -rf "/usr/local/share/boost"
          sudo rm -rf "$AGENT_TOOLSDIRECTORY"
          echo "Post cleanup"
          df -h
saraswatpuneet commented 2 years ago

@demisx it was for something I dont remember, we can remove it if we will keep os type same in matrix and hence it is just for Ubuntu cleanup

demisx commented 2 years ago

@saraswatpuneet No problem. I left it out for now. We can always add it later if such need arises.

demisx commented 2 years ago

So, I watched the Cachepot presentation and did some research on this sccache fork online. It's an interesting project, but I don't see an easy way to integrate it with GitHub Actions CI right now. At least, I couldn't find an existing GitHub action ready to be used. I am going to try other solutions and optimization techniques first.

demisx commented 2 years ago

One of the recommended optimization is to disable incremental compilation in CI. So, I am going to try to cargo build as such. Let me know if you guys see any issues with it.

CARGO_INCREMENTAL=0 cargo build --locked --release --features  frequency
demisx commented 2 years ago

These are the current execution times for future reference. Though, GitHub Actions is experiencing degraded performance right now, but at least we should not see it worse. So far, there is 40% improvement in vetting PR commit.

Screen Shot 2022-10-05 at 10 04 59 AM

demisx commented 2 years ago

This is the new file naming convention for the released binaries. Let me know if you'd like to change it in any way.

Screen Shot 2022-10-07 at 7 50 12 AM

demisx commented 2 years ago

The Directory Structure Wiki has been updated with the new CI workflows info:

Screen Shot 2022-10-10 at 12 09 22 PM

demisx commented 2 years ago

@wilwade @saraswatpuneet So I did various test runs and adjustments today. Captured some good points that I will put together in presentable format and share in am.

For now, I think I can leave v0.1.3 release as the baseline release under new CI which has all functionality ported from the previous workflows. Please take a look at the released artifacts and let me know if you see any issues.

Here is what's being released each time we push a release tag to remote:

Number Artifact Network Location
1 Frequency Binary (amd64) Mainnet GitHub Releases
2 Frequency Binary (arm64) Mainnet GitHub Releases
3 Frequency Binary (amd64) Rococo GitHub Releases
4 Frequency Binary (arm64) Rococo GitHub Releases
5 Frequency WASM Mainnet GitHub Releases
6 Frequency WASM Rococo GitHub Releases
7 Collator Node Dev Image Local DockerHub
8 Collator Node in Instant Seal Node Local DockerHub
9 Parachain Node Rococo DockerHub
10 Parachain Node Mainnet DockerHub
11 Rust Developer Docs N/A https://libertydsnp.github.io/frequency/
12 JS API Augment N/A NPM Registry
demisx commented 2 years ago

The CI diagram (v2) has been adjusted to reflect the latest and greatest at the time of writing.

demisx commented 2 years ago

@wilwade Here are some notes and findings from implementation of the new CI. Would you like to prioritize any of these for me to work on next? Some long-terms can be placed into icebox.

  1. Releases aren't atomic as partial artifacts package can be published. Do they need to be atomic (all or nothing)? If not, we need a process for rolling back a failed release.
  2. The current workflow does not support multiple release candidate branches, it assumes that releases will be cut by tagging commits in main. Should we start working on multiple release branch Git workflow next?
  3. Whatever we annotate the release tag with is what currently gets displayed on the GitHub release page. See example here. Consider generating release notes file as part of release workflow and using that for content on the GItHub release page.
  4. What exactly would we like to borrow from Polkadot Releases release notes?
  5. Consider switching to Fast-Forward only PR merging policy in GitHub, so the same commit we have verified in the PR ends up on main branch after merge. Developers will need to rebase their PRs onto origin/main before a PR can be merged.
  6. We currently publish binaries for amd64 and arch64 platforms, but the docker images are amd64 only. Not sure if this is an issue.
  7. JS API Augment is versioned as currently latest: 0.1.3 and next: 0.0.0-[short-sha]. Do we want to change the latter?
  8. Rust Dev Docs don't get properly versioned.
  9. The NPM release version is not recorded in the package.json
  10. A release with the same version tag can only be executed successfully once. NPM doesn't allow publishing under the same release version more than once.
demisx commented 2 years ago

I've been testing this approach with the release workflow doing all the builds first then releasing artifacts in parallel right after all builds have finished successfully. However, there are still 2 issues keeping it away from ideal:

  1. Passing that many files through cache doesn't seem to be very stable. I had an intermittent failure where the job couldn't restore the previously saved cache and failed. A work around is to check for cache-hit and trigger build if false, but I wan to avoid building multiple times.
  2. It's still possible that one of the release jobs will fail uploading an artifact and release is no longer atomic.

I am going to run more tests today and look for a better solution out there to make it more fail proof.

Screen Shot 2022-10-12 at 8 11 41 AM

demisx commented 2 years ago

Based on our Slack discussions, these are the assumptions I am using as a starting point for the new multi-branch workflow:

  1. There will be 3 long-lived branch types:
    1. Branch 1 – corresponds to the version the Mainnet is on (live)
    2. Branch 2 – corresponds to the version Rococo is on
    3. Brach 3 – the next version under development, e.g. origin/main
  2. The change management strategy will permit hot fixes where changes can be made via PR directly to each long-lived branch.
  3. The developer would be responsible for porting hot fix changes back to main and whatever other branch those changes may be applicable to.
  4. Rolling back of changes would require a new PR with the changes undone
  5. Once a long-lived branch is created, it can stay in the repo forever (no limitation on lifecycle)
  6. Release will be triggered once a vx.x.x* release tag is pushed to the remote, regardless of the branch
  7. Deployer will be responsible for creating proper release tags. No duplicates are allowed.
demisx commented 2 years ago

These are some initial thoughts on the branch naming. Most likely will change once feedback is collected from the team.

Number Branch Naming Purpose
1 release-vx.x.x* (version before last) Corresponds to the version the Mainnet is on (live)
2 release-vx.x.x* (last version) Corresponds to the version Rococo is on
3 origin/main Represents the next version of the chain under development
demisx commented 2 years ago

Based on our Slack discussions, here are the refined assumptions:

  1. There will be 2 long-lived branch types:
    1. "Release Branch" – corresponds to the Polkadot version, e.g. v0.9.29, v0.9.30, etc.
    2. "Next Branch" – the next version under development, i.e. a release candidate
  2. A release branch will always be created from main
  3. Once a long-lived branch is created, it can stay in the repo forever (no limitation on lifecycle)
  4. Release will be triggered once a vx.x.x* release tag is pushed to the remote, regardless of the branch
  5. The change management strategy will permit hot fixes where changes can be made via PR directly to a release branch via PR.
  6. The developer would be responsible for porting hot fixes back to main and whatever other release branch those changes may be applicable to.
  7. Rolling back changes would require a new PR with the changes undone.
  8. When a release is triggered on a given release branch, all artifacts will be built and published like we do this now. This gives the node owner a choice which binary to use from a particular published release version.
  9. The deployer will be responsible for creating a proper release tag. No duplicate tags are allowed.
Number Branch Naming Purpose
1 release-vx.x.x* Corresponds to a given released Polkadot version, e.g. release-v0.9.29, release-v0.9.30, release-v0.9.30-1, etc.
2 origin/main Represents release candidate - the next version of the chain under development
demisx commented 2 years ago

Here is the visual representation of Frequency Git Workflow. Covers the following use cases:

  1. A change which involves a runtime upgrade
  2. A change which doesn't involve a runtime upgrade
  3. A hot fix implemented on a release branch
  4. A release tag is pushed to remote (triggers release)

Let me know if I missed any use cases. The next step would be to document this workflow in Wiki and describe each use case in more detail, so developers can easily follow it.

demisx commented 2 years ago

I've put together the initial Wiki page describing our Frequency Git Workflow. Trying to keep it concise, straight to the point, so developers are most likely to read the whole thing.

https://github.com/LibertyDSNP/frequency/wiki/Frequency-Git-Workflow

Please let me know if you have any feedback. Also, feel free to re-word sentences directly as needed.

demisx commented 2 years ago

Polkadot uses these sort of templates to generate release notes. I like this templated approach as it's pretty flexible. Looking into something similar. Don't want to introduce a new 3-rd party dependency on Ruby like the do. Maybe we can achieve something similar with Handlebars.js

demisx commented 2 years ago

So, I’ve evaluated Mustache, HandlebarsJS and a couple other templating engines with CLI support. Though not as mature as the first two, I think it makes sense to go with Tera templating engine and tera-cli for this project. It’s written in Rust and this is what Polkadot uses too.

My plan is to set up a release notes template in Tera and then pass JSON data context into it in release workflow. This should give us a good flexibility to adjust the format of release notes in the future.

demisx commented 2 years ago

Adding this Polkadot Release Checklist to the description for future reference.

demisx commented 2 years ago

Here is the process I am thinking of following for generating binary signatures. Please feel free to comment:

UPDATE: Moved this checklist to #594 where it belongs.

demisx commented 2 years ago

I need to know the latest release in order to generate change log during the new release CI run. I am planning to introduce the latest tag that will always point to the latest release. This way it's consistent with GitHub and Docker Hub. Let me know if anyone has any objections or better ideas.

demisx commented 2 years ago

Per conversation with @wilwade, a release candidate will still generate a change log, but it will do it against the latest full release. A new full release will still compare against the previous full release, thus disregarding the release candidates in between.

Currently testing mainly these change log generators:

demisx commented 2 years ago

@wilwade If we update CHANGELOG.md during release I can easily commit it to the release branch. However, I would think we'd want those changes also backported to main. I see 3 ways to go about this:

  1. Forget about CHANGELOG.md and track changes in GitHub releases only. Looks like this is what Polkadot, Moonbeam, Alstar, Acala are doing.
  2. Create a new GitHub token with push rights to main
  3. Create a new PR with changelog updates during each release

The first one is easier to implement, followed by the 2nd, but the 3rd one may be safer. Do you have any preference here?