MeltanoLabs / tap-github

A Singer tap for extracting data from Github. Powered by the Meltano SDK for Singer Taps: https://sdk.meltano.com
Apache License 2.0
18 stars 29 forks source link

Need process to manage pinnable release versions #47

Closed aaronsteers closed 2 months ago

aaronsteers commented 2 years ago

@ericboucher @laurentS @edgarrmondragon

This is a fast-developing tap! We've expanded now to 9 streams and we've got several more on the way.

What can our plan be for efficient management of releases? Kicking off this issue for discussion.

I would love to introduce auto-changelog entries and auto-publish to PyPi if those were feasible. If not auto-generated, then we would need a process to efficiently manage the changelog and release proceses.

aaronsteers commented 2 years ago

Strawman for discussion:

  1. Add a manually triggered CI job called create_release with one input release_version (passed as "X.Y.Z").
  2. The pipeline runs bumbversion and translates commit history into corresponding changelog entries.
  3. The pipeline commits the result to a new branch (release/vX.Y.Z) and kicks off a PR to merge that to main branch.
  4. Manual review and grooming of the changelog entries can be performed on the MR's branch, per usual.
  5. When ready, a repo administrator sends the /publish slash command in a comment on that PR.
  6. The CI pipeline listens for slash commands.
  7. When the /publish slash command occurs on a branch named like release/* and the role of the comment author is Administrator, then the CI pipeline triggers a poetry publish.
  8. Manually merging the PR after a successful publish will bring main branch up-to-date with the latest changelog and version bumps.
edgarrmondragon commented 2 years ago

We could have an action similar to https://github.com/MeltanoLabs/tap-dbt/blob/main/.github/workflows/release.yml that only requires pushing a commit with the version bump and then manually create a GitHub release. So a single PR can be used to bump the version, groom the changelog and then the release can be created referencing the PR branch.

There's already a MeltanoLabs PyPI token in that repo, and it might make sense to make it an org-level secret if possible.

aaronsteers commented 2 years ago

@edgarrmondragon - Nice! I just had a look now at the tap-dbt workflow. Probably comes as no surprise that I'm also thinking of how to make this a part of the cookiecutter eventually... 😅

I think part of the hurtle of tap developers getting to "stable" is building their own publish and versioning workflows. When those don't exist (like for my tap-athena WIP), it can create a bad experience for those users who might otherwise feel okay being early adopters.

edgarrmondragon commented 2 years ago

The more I read about GitFlow the more I like the idea. In particular the automation it allows:

When the source code in the develop branch reaches a stable point and is ready to be released, all of the changes should be merged back into master somehow and then tagged with a release number. How this is done in detail will be discussed further on.

Therefore, each time when changes are merged back into master, this is a new production release by definition. We tend to be very strict at this, so that theoretically, we could use a Git hook script to automatically build and roll-out our software to our production servers everytime there was a commit on master.

(from https://nvie.com/posts/a-successful-git-branching-model/)

aaronsteers commented 2 years ago

@edgarrmondragon - Strangely, this looks exactly like a flow I have built before, and still I did not know this was called "Git Flow". I previously used a "development" or "develop" branch but was not yet sure if we should advocate the same here. After reading that GitFlow is now a fairly well established pattern, I'm inclined to leverage this - or something similar - as our best practice for MeltanoLabs since there's plenty of material on this topic, and post-automation, this is very straightforward to maintain.

Here was the most concise summary I could find of the flow, <3 minutes: https://youtu.be/1SXpE08hvGs

Adding to the video above, my expectation is that all release flows will be automated with CI -perhaps incrementally automating them but I'm a way that all taps and targets here in MeltanoLabs would basically use the same CI workflow files.

Still up for discussion:

aaronsteers commented 2 years ago

Update: The hypermodern python template has some automation templates for GitHub release management: https://github.com/cjolowicz/hypermodern-python/tree/master/.github

Uses:

This would be the release model, as far as I understand this approach:

  1. PRs created as usual, with extra labels.
    1. PRs will be categorized in the change log so the "adds" / "changes" / "breaks" / "skips-changelog" labels are needed on the PR for guidance on how the changelog will be populated.
    2. Changlog updates are not needed. Instead, the PR descriptions themselves can be groomed to proactively tidy what will become the changelog entry.
  2. PRs are merged to main as usual.
    1. Instead of committing to a git artifact, changelog drafts are created as "Release Drafts" in GitHub using their Releases API.
    2. The Github CI flow auto-creates (or auto-updates) the latest unpublished changes on main as a "Release Draft".
  3. The "changelog" file is actually the release draft text. That text may optionally be groomed manually between releases.
  4. A "publish" action is simply to change the status on the "draft" release notes to "Published", triggering this flow: https://docs.github.com/en/actions/learn-github-actions/events-that-trigger-workflows#release
    1. Publishing a release draft kicks off another CI flow to actually handle the release (performing the PyPi publish, for instance).

What I really like about this model is that we don't require extra commits to the git repo for grooming the changelog, and the release notes of the project then serve that purpose.

Still researching: where and how the version bump occurs and is committed back to the main branch.

pnadolny13 commented 2 years ago

This is great! I've also used gitflow in the past and it worked pretty well for us. I agree for this use case its common and well documented so its better than bringing our own workflow.

The only thing that I'm not clear on is how keeping main as the default branch would work. I think making develop the default branch is how gitflow was intended to be used. My understanding of how the release drafter would work in this context is that as we merge PRs into the default branch (develop for us) its keeping a draft release PR (to main branch) up to date, then when were ready we would merge that PR which includes all develop features, version bump, changelogs. That merge to main would kick off the publish steps. I'm not totally clear if the draft releaser can do develop/main though, it kind of seems like it wants to only use the default branch for everything which conflicts with gitflow.

Does that make sense? Was that what you guys were thinking or am I misinterpreting anything?

pnadolny13 commented 2 years ago

Theres an issue already for this but I couldnt find it. Do we have any thoughts on what we do it the pypi namespace is already taken for a package? For example the singer-io variant exists for tap-google-analytics so we cant publish https://github.com/MeltanoLabs/tap-google-analytics. What about just github tagged releases? I think its less than ideal for dependency resolution in things like poetry (any other major issue with this?) but at least users can pin a version like git+https://github.com/MeltanoLabs/tap-google-analytics.git@v1.0.0. Or the other option is updating the namespace with a meltanolabs prefix tap-google-analytics-mlabs or meltano-labs-tap-google-analytics? They get kind of ugly/verbose but it could still be the best options. What do you think?

edgarrmondragon commented 2 months ago

Closing in favor of https://github.com/MeltanoLabs/tap-github/issues/307 and continue using GitHub release notes