[RFC] Use Github Actions for StackStorm-Exchange CI and Maintenance

StackStorm / community

Async conversation about ideas, planning, roadmap, issues, RFCs, etc around StackStorm

https://stackstorm.com/

Apache License 2.0

8 stars 3 forks source link

[RFC] Use Github Actions for StackStorm-Exchange CI and Maintenance #63

Open cognifloyd opened 3 years ago

cognifloyd commented 3 years ago

Current State

StackStorm-Exchange is using CircleCI for (1) pack testing and (2) updating the exchange index. Most of the CI logic is centralized in StackStorm-Exchange/ci to simplify exchange-wide CI updates. The CircleCI config (.circleci/config.yml) is a key piece of the CI infrastructure that cannot be centralized; a copy is kept in every pack on the exchange. CircleCI config can be edited by pack authors to add jobs, add docker images or steps to the standard jobs, etc. Also, the schedule for when weekly pack tests occur (Saturday/Sunday) is in the pack-local circle config.

All other administrative tasks across the exchange require manual intervention by oneor more of the project maintainers. There are several versions of shared scripts that the maintainers pass around (in StackStorm-Exchange/ci, StackStorm-Exchange/exchange-tools, and StackStorm-Exchange/exchange-misc) to handle these administrative tasks. Other updates might be handled by ad-hoc scripting.

aside: this proposal has nothing to do with recent discussion around dropping python2.7 testing on the exchange. It is about optimizing our CI infrastructure for the Exchange.

Goals

Automate as many of the manual administrative tasks as possible. Wherever possible, prefer solutions that happen on-push, or on-merge, or other no-touch triggers.
Allow individual packs to add additional custom testing workflows.
Allow individual packs to customize the standard workflows/jobs (eg add docker images, add system deps, setup custom testing harnesses, download extra repos needed for tests, ...)
Given that customization, minimize the effort required to:
- adjust the weekly test schedule for multiple packs (Saturday/Sunday/etc).
- do exchange-wide pack updates for our standard pack CI workflows: pack tests, index update. An autonomous solution that updates pack CI config in all repos once a "master" copy of the config gets merged would be ideal.

Proposal

I recommend we move the Exchange's CI workflows from Circle CI to Github Actions. This includes writing a variety of new workflows to automate as many administrative tasks as possible.

For at least one of those workflows, use a tool that allows us to merge yaml files so that the standard CI jobs can be extended in ways that do not prevent automated updates of the file. Some form of centralized "variables" could be used to define, for example, the test schedule for various packs.

Other CI provider options

Many projects (including StackStorm) have been abandoning Travis CI for many reasons, so that is not a viable option for the exchange.

Circle CI has served us well. Circle CI uses a single config file for all of the CI workflows. However, a single config file for all CI makes it more difficult to both allow customization, and simplify exchange-wide CI updates.

The first bullet under Goal 4 provides another reason not to use CircleCI. The schedule is another potential merge conflict that makes doing exchange-wide pack updates difficult.

Why Github Actions (GHA)?

Looking at the goals listed above, writing workflows for 1 could probably be accomplished with any CI provider. But GHA might have a slight edge as it is much closer to the Github API, and many community-written-actions alreadfy implement some of the workflows, or steps in those workflows, that we'll need.

Goal 2 is the clearest winner with moving to GHA. This comes free with GHA, as each workflow is a separate file in the repo. So, any custom workflow files will be safe from exchange-wide updates to the standard workflows.

The tension between goals 3 and 4 require additional effort. But, with additional tooling we can define extension points in our standard CI pack test workflow(s) so that it can be updated by both pack authors and by exchange-wide CI updates. There are few yaml templating/merging options available. At least one of them should be runnable within github workflows.

Proposed new workflows

For the first bullet under goal 4, adjust weekly test schedule, we could have a centralized file that assigns packs to CI slots (currently Saturday or Sunday). Then, when that file gets updated (eg w/ a new pack), that can trigger a workflow that pushes out the assigned schedule to packs that need an update.

For the second bullet under goal 4, exchange-wide pack CI updates, we could have a central template file for the standard CI workflow config. When a PR that updates that template is merged, that would trigger a CI workflow that goes through all the exchange's packs, regenerates the affected standard workflows. That regeneration would take into account the schedule, and it would merge any customizations into the standard workflow.

Also, when the files that provide the customizations are updated in one of the packs, that should trigger regenerating the affected workflow file as well.

YAML merge tool options

Here are two options:

modulesync/pdk
gflows

I would prefer gflows, assuming it works well for our use cases.

`modulesync` and/or `pdk`

Written in ruby.

Quoting @nmaludy: https://stackstorm-community.slack.com/archives/CSBELJ78A/p1612183089198000

The way the Puppet community handles problems like this is to use a tool called modulesync and/or pdk (they perform the exact same).

Basically they work by having a "template" directory structure with a bunch of ERB (think Jinja) templates. Those templates get rendered based on a config file .sync.yml customizations that can be unique per-repo (pack in our case).

This way if one pack needs to modify the CircleCI config by adding jobs or service containers, steps, whatever... It is very easy to do so and those differences from the "normal" are documented and able to be maintained, even when the template config changes.

These tools also provide a way to mass-render and make PRs across a huge number of repos. Basically, for each repo, make branch, run modulesync, push branch, make PR.

The other difference here is that these tools push the CI scripts to each repo, rather than having a central CI repo that everyone has to clone. Bonus points on this is that each repo has its own Gemfile.lock (think requirements.txt ) that can be used for correctly hashing CI requirements for a repo. I know we've had the problem in the past where a dependency in some cloned repo was causing problems and even though we updated it, the CI cache wasn't changed because we weren't properly aggregating our requirements.txt.

https://github.com/voxpupuli/modulesync https://puppet.com/docs/pdk/1.x/pdk_reference.html

`gflows`

Written in go.

gflows is a tool (and a github action) that uses jsonnet or ytt to template github workflow files.

Being a golang static-binary, it can be simply downloaded/run both in CI, and locally on pack authors' machines. Thus, the pack author will not have to set up any special environment (ruby, python, or otherwise) to regenerate any customized standard workflow file locally when authoring the pack.

https://github.com/jbrunton/gflows

Alternatives

One approach to adding customization within the CircleCI config is to one or more bash scripts throughout the standard workflows so pack authors can add their customization there. This is the approach taken by: https://github.com/StackStorm-Exchange/ci/pull/101

This satisfies only part of goal 3. Other changes, like adding docker images or adding workflows, would also need a yaml merge solution.

Request for Comment

Does anyone have any additional pros/cons or opinions to add about:

moving StackStorm-Exchange from CircleCI to Github Actions
automating exchange maintenance with new workflows
gflows vs modulesync
using gflows to allow explicit customization of the pack CI

arm4b commented 3 years ago

That's a very detailed proposal, thanks for putting it together! :+1:

My concern is that the occurrence and severity of the problem highlighted and trying to solve it this way is unproportionally minor comparing to a proposed major change, eg. switching the entire CI platform.

From the other side we all see how GH Actions are gaining popularity, people are getting more fluent with them and might be a good de-facto standard CI for the Exchange in the future. From the disadvantages @nmaludy mentioned one time is that GH Actions don't have SSH debugging that CircleCI has which might be very helpful sometimes.

I'm thinking that we may find more advantages with time about moving to GH Actions. @nmaludy @blag Whare are the other pain points behind the Exchange with CircleCI, could GH Actions help there?

blag commented 3 years ago

Before we change over from CircleCI to GHA, let's take minute and get some data on how many CircleCI configs have been customized, and how invasive those customizations have been. I would guess that not many packs have customized a great deal.

I think a larger issue is that we've had updates to the StackStorm-Exchange/ci sample CircleCI config that haven't been pushed out to all packs, so there's probably a few previous versions of StackStorm-Exchange/ci's CircleCI config in various Exchange packs. It would be nice if we had an automated process to push out changes to StackStorm-Exchange/ci's CircleCI config to all packs. And we can either do that with StackStorm itself, or we can do it with GitHub Actions in the StackStorm-Exchange/ci repository itself (eg: a deploy workflow that runs for each PR that touches .circle/circle.yml.sample, and grabs the diff of that PR for the .circle/circle.yml.sample file and tries to apply it to the .circleci/config.yml files of all packs, reporting any failures to Slack so a human can look at it).

I don't see the weekly test scheduling as a big deal, except for massive failures (like Python 2.7 tests failing across the board). And even then, it's just a matter of deleting a bunch of email. I don't really know what benefit we get from keeping/controlling that information in a single place, since in the years that I've been working with StackStorm, we haven't ever adjusted that schedule, and I don't really see a need to. The weekend pack CI runs are the exact same workflows that run for every opened PR and every PR merge. And the ideal is that the weekend tests are only an early warning system for StackStorm pack devs (eg: transitive dependency causing issues). In practice, 99% of the weekend pack CI failures are due to GitHub Personal Access Tokens expiring due to disuse.

There's a balance to be struck here. Some packs have a legitimate need to be able to customize the test environment, and I absolutely think that we should support whatever is needed. However, for the sake of keeping our cognitive and maintenance burdens as low as possible, we should strive to keep the test configs as consistent as we reasonably can. Letting pack authors go crazy with their test configs is going to end in frustration and tears for everybody.

To summarize my post here:

Investigate whether we can use GHA to apply changes to the sample CircleCI config across all packs to keep them up-to-date while still allowing a bit of customization.
It would be far more beneficial to remove the need for Personal Access Tokens during tests and deployments. I'm curious what happens if we just don't specify MACHINE_USERNAME and MACHINE_PASSWORD at all, since the PATs that expire all the time are only used to clone pack repositories that are already public.

Sorry to pour cold water on this idea. Having maintained StackStorm Exchange for a few years now, I'd like to think I have a pretty good perception of what our priorities should be.

arm4b commented 3 years ago

Some of the StackStorm-Exchange problems are highlighted in https://github.com/orgs/StackStorm-Exchange/projects/1

cognifloyd commented 3 years ago

I looked into what it would take to switch just the deploy step to GHA, leaving the tests in CircleCI (at least for the time being).

What the deploy step does

The deploy step does two related things. First it pushes new tags into the pack repo when the version changes in pack.yaml. Then it updates the index.

The first part would be very easy in GHA as select workflows can have automatic write access to the repo. By default PR workflows do not have write access or access to secrets. Schedule and other event triggers can however.

The second part pushes updated metadata to the index repo. The token github provides only provides access to the current repo, however. So, we would need to create a ssh key pair, put the public key in the ci repo, and keep the private key in a secret that is made available during the workflow through an action like: webfactory/ssh-agent

How to move `deploy` to GHA without moving tests.

Looking at GHA workflow triggers, we could use either check_suite or check_run to allow triggering deploy only after CircleCI has finished its work.

Such a workflow would need one of these triggers:

  on:
    - check_suite:
        types: completed
    - check_run: 
        types: completed

But, we can only trigger once the check_suite/check_run completes, and only on the repo's default branch. The workflow that runs would need to investigate the payload to determine if it can proceed to actually run the rest of the deploy step. The interesting bits of each payload are:

check_suite payload:
  status: completed
  conclusion: success
  app:
    slug: circleci-checks
  head_branch: master
  head_sha: ...

check_run payload:
  status: completed
  conclusion: success
  name: build_test_deploy_on_push
  app:
    slug: circleci-checks
  head_sha: ...

It would need to make sure any relevant checks (like app.slug=circleci-checks) have conclusion=success before continuing with the workflow.

cognifloyd commented 3 years ago

Re customized CircleCI config:

Before last week's exchange-wide pack CI updates, some packs had reformatted the CircleCI config slightly. Now, the CI config has been standardized across all packs with these exceptions:

The weekly test run schedule differs between packs (some on Saturday UTC, others on Sunday UTC).
2 packs have customized the CircleCI config:
- Zabbix has additional integration test jobs.
- Vault has some custom test environment setup logic.

arm4b commented 3 years ago

Looking back, following the Security discussions https://github.com/StackStorm/private-discussions/issues/5 SSH access to CI system for Exchange Packs looks like a downside/risk these days. Nowadays GH Actions stabilized and gained its trusted reputation as a de-facto CI standard in Github that works flawlessly.

+1 for migration from CircleCI to GH Actions for a native & seamless integration to fix the current pain points of StackStorm Exchange.

cognifloyd commented 3 years ago

I no longer believe a shared CircleCI + GHA is a possibility because so much of the exchange infra is broken with CircleCI changes that force using an ssh deploy key. That in turn breaks our deploy workflow because it conflicts with how we're using the PAT to clone and modify repos in CircleCI.

This is an overview what I think we need to do to move the current CI to GHA. Some of my other comments above explain additional workflows and future improvements, but ignore the bits about combining CircleCI with GHA.

We have a common set of CI workflows for all StackStorm-Exchange workflows:

build_and_test_python36
deploy

I would start with converting the build_and_test_python36 job to GHA before worrying about the deploy job. The deploy job will be more involved since it requires changes across repositories.

For CircleCI, we had to copy the .circleci/config.yml from a master copy to each pack repo. We use the same steps for so many repos, I would love a way to not have copies of the jobs config in each repo. I believe we can use Composite Actions to centralize (pieces of?) these jobs. https://docs.github.com/en/actions/creating-actions/creating-a-composite-action

The master copy of the exchange CircleCI config is here: https://github.com/StackStorm-Exchange/ci/blob/master/.circle/circle.yml.sample

That ci repo would probably be a good place to put the composite actions, assuming we can put more than one composite action in the same repo (edit: we can). Then, in each pack repo we would have a much lighter weight GHA workflow that specifies the cron schedule for weekly tests and uses those composite actions.

One gotcha in all of this, is there are a couple of repos that had to modify the main workflow: the vault and zabbix packs:

Here are the two changes are in the vault pack. We might need a way to add a few pack-specific commands to allow packs to inject changes like these: https://github.com/StackStorm-Exchange/stackstorm-vault/blob/master/.circleci/config.yml#L33-L35 https://github.com/StackStorm-Exchange/stackstorm-vault/blob/master/.circleci/config.yml#L45-L46
The zabbix pack has added some jobs, but luckily the shared jobs are the unmodified, so those new jobs can become a separate workflow after we switch the exchange-wide jobs to GHA.

arm4b commented 3 years ago

Good stuff. Nice idea with the GH composite actions, similar to CircleCI orbs :+1: And yeah, there should be a basic way for pack maintainers to do something outside of the default CI pipeline.

Perhaps having a new PoC stackstorm-exchange pack would be a good way to experiment with all the machinery & show it.

cognifloyd commented 3 years ago

OK I stubbed together some composite actions. They will not work yet, but hopefully they're a good starting point. https://github.com/StackStorm-Exchange/ci/tree/gha/gha

To use these, I imagine a workflow that looks something like this:

name: CI

on:
  # ...

jobs:
  build_and_test:
    runs-on: ubuntu-latest
    name: 'Build and Test - Python ${{ matrix.python-version-short }}'
    strategy:
      matrix:
        include:
          - python-version-short: 3.6
            python-version: 3.6.13
    steps:
      # eventually replace @gha with @master
      - name: Checkout Pack Repo and CI Repos
        uses: StackStorm-Exchange/ci/gha/checkout@gha

      - name: Install APT Dependencies
        uses: StackStorm-Exchange/ci/gha/apt-dependencies@gha
        with:
          cache-version: v0

      - name: Install Python Dependencies
        uses: StackStorm-Exchange/ci/gha/py-dependencies@gha
        with:
          cache-version: v0
          python-version: ${{ matrix.python-version }}

      # vault pack would add one or more custom test setup steps here

      - name: Run pack tests
        uses: StackStorm-Exchange/ci/gha/test@gha
        with:
          # This makes the tests use an alternate config that enables shared libs
          enable-common-libs: true

We will still need to manage the cron schedule in each pack repo. In circle, each of the job steps were shell scripts that were painful to update. Using the composite-actions should alleviate that because we won't have to synchronize any shell scripts across all of the pack repos. Instead each of the common steps is just a reference to our composite actions with input vars to allow per-pack customization if required.

cognifloyd commented 3 years ago

For the deploy workflow, we have a variety of problems:

serializing index updates via distributed workflows is problematic
credentials for cross-repo changes are hard to manage
- Github App and Oauth App both require some kind of persistent service
- PATs are far too course:
- they are user-specific
- if you've got repo access, then that gives you access to all repos that user can access
- they can't be tied to just one repo

Serializing index updates

A persistent service would make serializing updates more natural, but then we have to deal with a persistent service (If we do go with a persistent service, GCP's free-tier offers 1 free e2-micro VM instance per month).

But, maybe we can get away with creating a semi-persistent "service" using a github actions workflow.

Based on the Github Usage Limits, workflows are limited as follows:

Job execution time - Each job in a workflow can run for up to 6 hours of execution time. If a job reaches this limit, the job is terminated and fails to complete.

Workflow run time - Each workflow run is limited to 72 hours. If a workflow run reaches this limit, the workflow run is cancelled.

If we had a workflow running for the max of 72-hours, splitting that into 6-hour jobs would mean a workflow with 12 jobs where the job concurrency is limited to 1. But I'm not sure how to start a workflow every 72-hours. Using a cron scheduled workflow, we could easily do a workflow that runs every day (eg 0 12 * * *). A 24-hour workflow would have 4 serial 6-hour jobs (maybe a matrix of 4 jobs, concurrency limited to 1).

I don't think there's a good way to receive webhook events from github within a github action workflow. But we can poll Github's events API for StackStorm-Exchange org events. The index is meant to be eventually consistent since we have to serialize index updates, so this Events API caveat should not be a problem:

We delay the public events feed by five minutes, which means the most recent event returned by the public events API actually occurred at least five minutes ago.

Index update workflow

So, the index update workflow/job would do something like this:

checkout index repo

# poll_github_for_events would handle any api rate limits / back off
for event in poll_github_for_events("StackStorm-Exchange"):
  if not is_pack_release_event(event):
      continue

  clone_pack_repo(event)  # or update the existing clone if we've already cloned since the job started

  # rebuild pack index directory
  convert_pack_resource_metadata_to_json()
  convert_pack_config_schema_to_json()
  copy_pack_yaml()
  add_resource_counts_to_pack_yaml()

  # rebuild the index itself
  rebuild_index_json()

  # finalize update for pack
  copy_and_optimize_pack_icon()
  git_commit_index()
  git_push_index()

Pack Deploy workflow

Each pack's deploy workflow, then would only have to:

update repo tags
create release
optional: update repo description to match description in pack.yaml

Sane Github credentials management

Doing index updates this way fits within how Github currently manages tokens for workflows (read/write for the current repo; read-only for everything else), so we would not need any PATs or the persistent credentials that come with a Github app or an oauth app.

cognifloyd commented 3 years ago

OK. I've spent a lot of time figuring this out. I don't know when I'll have time to pick it up again. If someone else can please pick this up and work on any of these pieces, I would appreciate the help.

arm4b commented 3 years ago

I like the way how the pack CI logic would be hidden via abstraction behind the GHA composite actions in the https://github.com/StackStorm/discussions/issues/63#issuecomment-966820410. The example is nice! :+1:
I don't think long-running always-polling CI is a good idea. Hard to manage, hard to debug, the logic would be surprising (bad) and it's not what GHA was designed for. Also, it's easy to miss an event there if the workflow was interrupted for whatever reason.
Using an external service and storage/state is not an option. There were ideas to rely on external st2, but the point is that users should be able to create their own pack index without external tools and services, like the old (current) Exchange was (is). Some notes in here: https://github.com/StackStorm/discussions/issues/29
I understand you want to rework the entire way we build an index. Avoid push from each pack to the Index at all and do only pull from the GHA Index repo? The advantage, as I understand is - we won't need PAT and similar secrets anymore for every repo which would improve the security model. @cognifloyd I definitely like the idea and the advantages behind the model :+1:, despite being cautious about the redesign.

With what you propose, we can avoid a long-running workflow and run the Index Update Workflow once every 5 mins by cron, which will store in git some state/checksum or similar to continue where we left off. Worst case, if events API won't work, can use https://docs.github.com/en/rest/reference/repos#list-organization-repositories list-org-repos API checking the updated_at or something similar API for each repo and doing the needful :)

Eventual consistency is fine, 5mins would be good enough for the index rebuild.

cognifloyd commented 3 years ago

Thank you @lm-ydubler for helping to test/fix the build_and_test workflow. The gha branch now has a complete and functional build_and_test workflow.

For most packs, the workflow will consist of this (see https://github.com/StackStorm-Exchange/stackstorm-test/blob/gha/.github/workflows/build_and_test.yaml):

name: CI

on:
  push:
  pull_request:
  schedule:
    # NOTE: We run this weekly at 1 am UTC on every Saturday
    - cron:  '0 1 * * 6'

jobs:
  build_and_test:
    name: 'Build and Test'
    uses: StackStorm-Exchange/ci/.github/workflows/pack-build_and_test.yaml@gha
    with:
      enable-common-libs: true
      #apt-cache-version: v0
      #py-cache-version: v0

This uses github's newly GA reusable workflows to use this workflow. Any packs (like vault) that need to inject some logic, would copy this workflow and make their modifications instead of directly reusing it like this (differences include on:, the job name, and the comment about where the vault pack would add its custom test setup steps):

name: CI - Build and Test

on:
  push:
  pull_request:
  schedule:
    # NOTE: We run this weekly at 1 am UTC on every Saturday
    - cron:  '0 1 * * 6'

jobs:
  build_and_test:
    runs-on: ubuntu-latest
    name: 'Build and Test  - Python ${{ matrix.python-version-short }}'
    strategy:
      matrix:
        include:
          - python-version-short: 3.6
            python-version: 3.6.13
    steps:
      # eventually replace @gha with @master
      - name: Checkout Pack Repo and CI Repos
        uses: StackStorm-Exchange/ci/.github/actions/checkout@gha

      - name: Install APT Dependencies
        uses: StackStorm-Exchange/ci/.github/actions/apt-dependencies@gha
        with:
          cache-version: v0

      - name: Install Python Dependencies
        uses: StackStorm-Exchange/ci/.github/actions/py-dependencies@gha
        with:
          cache-version: v0
          python-version: ${{ matrix.python-version }}

      # The vault pack would add its custom test setup steps here

      - name: Run pack tests
        uses: StackStorm-Exchange/ci/.github/actions/test@gha
        with:
          # This makes the tests use an alternate config that enables shared libs
          enable-common-libs: true

    services:
      mongo:
        image: mongo:3.4
        ports:
          - 27017:27017
      rabbitmq:
        image: rabbitmq:3
        ports:
          - 5672:5672

Next step is to figure out the deploy stuff.

cognifloyd commented 3 years ago

You can see a successful test run here: https://github.com/StackStorm-Exchange/stackstorm-test/actions/runs/1509040299

cognifloyd commented 2 years ago

If we're running a frequent cron job to rebuild the index, every 5 minutes is probably too often as this involves cloning all pack repos. In my tests, it took 2-2.5 min just to clone all pack repos.

To test, I forked the index and made the gha branch the default branch. In the gha branch there is a simple workflow (triggered manually with workflow_dispatch) that reuses a workflow in the gha branch of the ci repo. After editing the workflow in the ci repo, I re-run or re-trigger the workflow in my index repo fork.

So far, the workflow clones ci/tooling repos and all the pack repos. Then it has a sample step to show how to loop through all the pack checkouts to do something simple (ls pack.yaml). That should be a good framework for converting the deployment script, which is designed to work with only one pack, into GitHub actions.

Here's my latest test run: https://github.com/cognifloyd/index/runs/4344158519

cognifloyd commented 2 years ago

To clarify why we would need to set a cron job to run less than every 5 minutes, we need to ensure that pack updates are serialized (no parallel or concurrent edits).

cognifloyd commented 2 years ago

With what you propose, we can avoid a long-running workflow and run the Index Update Workflow once every 5 mins by cron, which will store in git some state/checksum or similar to continue where we left off. Worst case, if events API won't work, can use https://docs.github.com/en/rest/reference/repos#list-organization-repositories list-org-repos API checking the updated_at or something similar API for each repo and doing the needful :)

OK. I think using cron is more of a possibility than I thought because github has concurrency primitives now to allow us to serialize a workflow. If workflows are consistently waiting for earlier workflows, then that means we need to either optimize something in the workflow to make it take less time, or increase the time between scheduled runs. https://github.blog/changelog/2021-04-19-github-actions-limit-workflow-run-or-job-concurrency/

Hopefully github won't be upset with re-cloning all repos in the StackStorm-Exchange org every 5 minutes. There are no rate limits on cloning repos, but they can add them on a case-by-case basis if they don't like the traffic pattern: https://github.community/t/git-clone-limits-using-git-commands-vs-the-api-what-are-they/14357/2

So, we don't have to muck with the events right now.

cognifloyd commented 2 years ago

I'm satisfied that the index update workflow does what it needs to. We'll just need to define a cron schedule before we merge it to master on the index repo.

Now, the final step: We need a process that creates tags on the pack repos. By process I mean, the steps someone needs to take when they want to cut a new release of a pack PLUS the Github Actions workflow(s) required to support those steps.

Current state: CircleCI

We should not blindly copy what the CircleCI workflow does, because that process is subtly broken.

https://github.com/StackStorm-Exchange/ci/issues/91

Basically, the CircleCI deploy step would:

scan through git history to find any commit that changed the version in pack.yaml
tag each commit with a version tag that matches the new version in pack.yaml

But, many people (myself included) updated pack.yaml in a PR because we know it will need a new version. But there are almost always multiple bug or typo fixes (eg to docs) after we've adjusted the pack.yaml. So, all of those commits after the pack.yaml update are not included until the next time a PR updates pack.yaml. Also, if that commit happens to be on a branch that is slightly behind master (but it will merge cleanly on master), then the merge commit produced when merging the PR will also not be included which means the tag won't include the newer commits on master either.

The future: on Github Actions

So, we need a different process that pack maintainers need to use to release new versions of a pack. The question is, what should that process look like? And what github workflow(s) do we need to support that process?

cognifloyd commented 2 years ago

Here are 2 possible workflows we could use:

a release workflow inspired by OpsDroid's process

I recently released a new version of OpsDroid, and I really liked their release workflow. Maybe we could do something similar for packs. Here is what their process looks like:

Every time a PR gets merged to master (ie on every push to master), update a draft release on github: https://github.com/opsdroid/opsdroid/blob/master/.github/workflows/release-drafter.yml
Then a maintainer goes to the draft release on Github and clicks the button to release it (could be done via API as well).
As soon as you create a release, github automatically tags the repo based on that release's metadata.

alternative tag-only process

But, that might make exchange-wide updates more difficult. So, we could also do something like:

Every time a PR gets merged to master (ie on every push to master), run a workflow that:
1. get the latest git tag on the pack repo
2. checks to see if the latest tag matches the version in pack.yaml
3. if it doesn't match, create a new tag and push it to the pack repo.

One thing this doesn't do is attempt to add tags for older versions. That is something that the CircleCI workflow tried to do, but it did not do it well (as detailed above). I don't think we need to do that.

arm4b commented 2 years ago

For Exchange, the fact that the contributor just bumps the version in pack meta and a new git tag is automatically created helped us a lot in the maintenance. Drafting Releases manually for every pack would be a step back for the Exchange and add friction, as well as another point of failure (relying on humans) to the workflow.

So yeah, automation with auto-tagging would be ideal, as before.

amanda11 commented 2 years ago

Doing that tagging on merge to master would be really good, and better than we had before. As I think it wasn't well known that essentially you shouldn't bump the pack version until you'd had an all clear on the review, and then need to do one more change to update the pack version - to prevent the tag being done on the wrong commit.

cognifloyd commented 2 years ago

OK. This is ready for more eyes. Thanks go to @ym-dubler for helping to test, invalidate a bunch of my assumptions, and push this forward! And thanks to LogicMonitor for dedicating resources to this issue!

We need to merge changes across multiple repos in this order:

https://github.com/StackStorm-Exchange/exchange-tools/pull/2
https://github.com/StackStorm-Exchange/ci/pull/121
- it will be helpful to review the branches/files listed below as well to see how these actions/workflows and the index/v1/exclude_packs.txt file are used.
https://github.com/StackStorm-Exchange/index/pull/24
- we need to decide on the index update cron schedule. For now it's set to every 5 minutes.

I already merged the gha branch on the test pack into its default branch (aaa instead of master). Here are the workflows we added:

These will need to be updated to replace @gha with @master once we've got the above changes merged.

After that I can start working on pushing these workflows to all the packs. I will need a senior maintainer to help disable CircleCI as we switch each pack over to GHA.

cognifloyd commented 2 years ago

So, just to clarify: the tag release workflow adds the tag on push to master (or whatever is the default branch) if the latest tag doesn't match the current version in pack.yaml.

lm-ydubler commented 2 years ago

Happy to have helped and glad to have worked with you for a solution.

cognifloyd commented 2 years ago

@winem @lm-ydubler and I just had a meeting, we talked about doing this to roll the GHA updates out:

@winem approves PRs
Merge https://github.com/StackStorm-Exchange/ci/pull/121
Merge https://github.com/StackStorm-Exchange/index/pull/24
Update all pack repos:
1. copy gha workflow files to each pack repo: https://github.com/StackStorm-Exchange/stackstorm-test/tree/aaa/.github/workflows
2. replace .circleci/config.yml in all repos with a simple workflow that has one step: echo CircleCI is disabled on StackStorm-Exchange
A senior maintainer deletes PATs/secrets from CircleCI for all pack repos

cognifloyd commented 2 years ago

Should we add the gha workflows to https://github.com/StackStorm-Exchange/exchange-template ?

arm4b commented 2 years ago

It doesn't contain any CI workflows, so probably good as is without adding another dependency to the repo.

cognifloyd commented 2 years ago

OK. afaict, we have excised CircleCI from packs on the exchange. The zabbix pack still has some custom integration test jobs that run on CircleCI in addition to our standard GHA-based tests.

Can a senior maintainer (an org admin) please:

delete the PATs (and other env vars) from CircleCI for each of the pack repos
hit "Follow All" on the StackStorm-Exchange CircleCI Dashboard to reset the CircleCI ssh deploy key so that packs can use CircleCI for custom tests if needed.
delete the pack PATs on Github from the stackstorm-neptr account

cognifloyd commented 2 years ago

I created some skeleton workflows to show the outline of GHA workflows to bootstrap a pack repo and add maintainers to it.

https://github.com/StackStorm-Exchange/exchange-incubator/pull/172

If you have some time, please pick one or more of the tasks in those workflows and implement them. It's on the gha branch of the exchange-incubator, so anyone on the TSC can push to that branch.

cognifloyd commented 2 years ago

Should we add the gha workflows to https://github.com/StackStorm-Exchange/exchange-template ?

It doesn't contain any CI workflows, so probably good as is without adding another dependency to the repo.

I would like to use that repo as a template repo to bootstrap new pack repos. https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-template-repository This would simplify maintenance for the misc things we find that need to go in new packs as mentioned in https://github.com/StackStorm-Exchange/exchange-incubator/issues/152#issuecomment-667198577 and https://github.com/StackStorm-Exchange/exchange-incubator/issues/153.

I think https://github.com/StackStorm-Exchange/exchange-template is designed to be used by pack authors. But does that repo have much utility for pack authors? I suspect something like https://github.com/EncoreTechnologies/cookiecutter-stackstorm would be more useful to pack authors. So, would it be a ok to repurpose it for setting up new packs?

cognifloyd commented 2 years ago

Progress report on bootstrapping packs via GHA:

https://github.com/StackStorm-Exchange/ci/pull/133
On github, I just removed all of the stackstorm-neptr PATs that we used in CircleCI. I did not clean anything up in CircleCI itself.
I created a new PAT (just one! :smile: ) under stackstorm-neptr for use in the bootstrap workflow
I created the required secrets in the incubator repo settings.
https://github.com/StackStorm-Exchange/exchange-incubator/pull/172

So, once that last PR is merged, we can bootstrap packs from incubator PRs with a !bootstrap pack comment.

Possible future workflows:

Something that updates the repo description to match the description from pack.yaml
- will require a PAT as editing repo descriptions cannot be done with the standard GHA GITHUB_TOKEN
- Maybe on the ci repo that happens once a month for all repos.
Some other workflow that facilitates exchange-wide CI updates
- Manually triggered, probably on the CI repo
- will require a PAT
- we will probably want this when we switch from python 3.6 to python 3.8
- maybe use files from the StackStorm-Exchange/ci-pack-template repo
- make sure to not clobber any customizations

StackStorm / community

[RFC] Use Github Actions for StackStorm-Exchange CI and Maintenance #63

Current State

Goals

Proposal

Other CI provider options

Why Github Actions (GHA)?

Proposed new workflows

YAML merge tool options

modulesync and/or pdk

gflows

Alternatives

Request for Comment

What the deploy step does

How to move deploy to GHA without moving tests.

Serializing index updates

Index update workflow

Pack Deploy workflow

Sane Github credentials management

Current state: CircleCI

The future: on Github Actions

a release workflow inspired by OpsDroid's process

alternative tag-only process

`modulesync` and/or `pdk`

`gflows`

How to move `deploy` to GHA without moving tests.