argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.77k stars 3.16k forks source link

Document & improve process for release rotation #12592

Open terrytangyuan opened 7 months ago

terrytangyuan commented 7 months ago

Summary

Purpose: get more people involved in releases and improve the overall process.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritize the proposals with the most 👍.

caelan-io commented 7 months ago

Adding teammates of mine for visibility @tico24 @Joibel @isubasinghe

We're happy to support on the release effort. 👍

terrytangyuan commented 6 months ago

Since we have two additional approvers, I would suggest one of you (@agilgur5 @isubasinghe) can try follow the instructions in https://github.com/argoproj/argo-workflows/blob/main/docs/releasing.md and see what is missing so we can improve the docs. I don't think we need any separate documentation for this. WDYT? Any volunteers?

terrytangyuan commented 6 months ago

For others without write access yet, you can still send PRs to release branch to help resolve any conflicts and then other approvers can review.

isubasinghe commented 6 months ago

Since we have two additional approvers, I would suggest one of you (@agilgur5 @isubasinghe) can try follow the instructions in https://github.com/argoproj/argo-workflows/blob/main/docs/releasing.md and see what is missing so we can improve the docs. I don't think we need any separate documentation for this. WDYT? Any volunteers?

Sure, I can get to this on Friday unless @agilgur5 beats me to it.

isubasinghe commented 6 months ago

Hmm probably should note in the docs that there was an implicit alias in the previous releases.

It would be nice if the "true" "false" options for the script itself were documented, I had to look into the script itself to figure out what it was doing.

If I am correct the new commits for v3.3.5 should be based upon the HEAD of 3.3 instead of 3.3.4 ? I find that a bit confusing.

This release process seems like quite a bit of work, wonder if we can automate some of this effort.

terrytangyuan commented 6 months ago

If I am correct the new commits for v3.3.5 should be based upon the HEAD of 3.3 instead of 3.3.4 ?

Yes, see the top of the document: "Please make sure that all patch releases (e.g. v3.3.5) should be released from their associated minor release branches (e.g. release-3.3) to work well with our versioned website."

It would be nice if the "true" "false" options for the script itself were documented, I had to look into the script itself to figure out what it was doing.

Well, the document mentions "get a list of commits you may want to cherry-pick" and "to automatically cherry-pick" with two separate code blocks already but maybe explicitly call out the flag would be helpful.

This release process seems like quite a bit of work, wonder if we can automate some of this effort.

Feel free to propose any improvements.

isubasinghe commented 6 months ago

Yes, see the top of the document: "Please make sure that all patch releases (e.g. v3.3.5) should be released from their associated minor release branches (e.g. release-3.3) to work well with our versioned website."

Yeah I saw this, but what I was trying to say is that I would like it to be even more explicit, just to remove any confusion. I guess I want to understand " to work well with our versioned website" this in more depth as well.

but maybe explicitly call out the flag would be helpful.

Yeah I think that might be nicer.

Feel free to propose any improvements.

I am having a look into it now, there are some tooling around this issue it seems like, will report back after I find out more.

agilgur5 commented 6 months ago

I guess I want to understand " to work well with our versioned website"

The release-3.4 and release-3.5 branches are visible on the versioned docs site. So if you update the branch, you'll update the docs site as well. Then you can just tag off the branch.

agilgur5 commented 6 months ago

This release process seems like quite a bit of work, wonder if we can automate some of this effort.

Feel free to propose any improvements.

If there are merge conflicts, then ostensibly no, those can't be automated. The main part I had proposed back in the Slack thread was to have something similar to CD's cherry-pick bot so that we can cherry-pick things as they come in instead of in batches. That way the context of the PR remains when cherry-picked and merge conflicts can be fixed quicker and potentially with the author even.

Ideally the bot (or other automation) would try to clean cherry-pick and if it works, cool, done. If not, it could either open a PR with the conflict or write a message on the original PR that there was a conflict and so manual resolution is needed.

isubasinghe commented 6 months ago

The main part I had proposed back in the Slack thread was to have something similar to CD's cherry-pick bot so that we can cherry-pick things as they come in instead of in batches.

Yeah this is exactly part of what I was thinking of as well.

agilgur5 commented 6 months ago

Here's the PR in CD that added the cherry-pick-bot: https://github.com/argoproj/argo-cd/pull/12591

Unfortunately that one creates a PR for every cherry-pick, so it creates a lot of duplicate PR noise. I would prefer to avoid that, especially as it makes the repo history much harder to search through with all the dupes

agilgur5 commented 6 months ago

We talked about this briefly in last week's Contributor Meeting, where I mentioned a replacement for the bot with a GH Action, e.g. https://github.com/vendoo/gha-cherry-pick. That will suffice for most of our needs, but the one problem with it is that it won't trigger CI after a cherry-pick since GH intentionally prevents actions from triggering each other to avoid infinite loops. We might be able to workaround that by manually dispatching GHA Workflows after the cherry-pick, but then we'd have to manually list every GHA Workflow that needs to be run (since you can't just run all of them, as far as I know).

Thinking about it a bit more though, we probably aren't running CI/tests on each cherry-pick when doing it manually / locally anyway. Similarly, as I learned in that contributor meeting, CI cancels itself on a commit when another commit is made on the branch before it's done (i.e. it only runs one CI job at a time on a branch, on the latest commit). So this is perhaps already a better option than manual / local without cons (unlike the bot). Perhaps we just want to make sure we run CI and that it passes on the release branch before making the release / tagging off the branch?

csantanapr commented 5 months ago

+1 on the use of automation to cherry pick commits, instead of doing all at the time of cutting a release.

agilgur5 commented 5 months ago

That will suffice for most of our needs, but the one problem with it is that it won't trigger CI after a cherry-pick since GH intentionally prevents actions from triggering each other to avoid infinite loops.

An interim workaround would just be for Approvers to manually cherry-pick fixes into the ongoing release branch (i.e. release-3.5, release-3.4) in their local and manually push them. This would solve the time delay, though is fairly manual and not explicit.

agilgur5 commented 5 months ago

For others without write access yet, you can still send PRs to release branch to help resolve any conflicts and then other approvers can review.

Regarding those without write access, we did discuss this in the previous Contributor Meeting and I had previously proposed on Slack giving temporary write permissions to Member+ on release rotation. That proposal was rejected, and merging PRs for an entire release (i.e. with multiple commits) is unfortunately not necessarily possible due to automated DCO issues (c.f. https://github.com/argoproj/argo-workflows/pull/12462#issuecomment-1877572919 and more recently #12711).

What we could do though is a "manual merge" -- a contributor writes up a PR to a release branch that fixes all conflicts, then, once approved, an Approver pulls those locally and pushes them to the release branch. That process for those without write access is actually fairly neat, all things considered. Before GH supported rebase merges and squash merges, I and others used to do this in other repos and would leave a comment on close as to how things were merged (random examples: https://github.com/agilgur5/react-signature-canvas/pull/3#issuecomment-303884454, https://github.com/django/django/pull/7762#issuecomment-269807584). This would effectively be a rebase merge as well.

agilgur5 commented 4 months ago

@terrytangyuan I'm not sure this has been completed? We're most certainly still iterating on it. Which comes before even documenting it.

We discussed it in the April 2nd Contributor Meeting as well, where @caelan-io said Pipekit would work on some improvements.


As an update from my end, I have been trying to follow the interim workaround I mentioned above for release-3.5. Despite that, I am still behind; I have all the CVE/deps patches, but am behind about ~20 fixes (although we did have a lot recently). And my current efforts suggest that a /cherry-pick action may not be that helpful, as I would say roughly half, if not more, of commits have merge conflicts when backported.

For deps, a chunk of that is due to selective backports, since one dep change can affect a few transitive deps and so they tend to be intertwined. I've tried going through the history and cherry-picking more deps to mitigate that with some success, but those aren't always CVE fixes. That has decreased since #12487 though, many of the conflicts were because Dependabot added an extraneous non-security update (that wasn't backported earlier due to its extraneous non-security nature).

My current thought process is that if it will be in part, substantially manual for the foreseeable future, we may want to have "release managers" for a given branch or set of patches at least. I.e. they are on duty to watch for and actively cherry-pick things and can decide on the patch release schedule as they choose as well -- one person who is primarily responsible. That is a common practice in other projects. Any automation would be tools to help the release manager, but cannot fully automate merge conflicts resolution.

terrytangyuan commented 4 months ago

Sorry I meant to close another issue. Thanks for catching it.

agilgur5 commented 4 months ago

and can decide on the patch release schedule as they choose as well

Alternatively, we just cut a patch release on a rolling schedule, e.g. every 2 weeks. Anything that's already backported is released, anything else has to wait till the next at least. That could be automated. And that's potentially easier to follow / more straightforward / less confusing for users and release managers. Users could contribute backports if they want a specific one out earlier.

caelan-io commented 4 months ago

Agreed, let's keep open. We want to figure out automation for this and have a few ideas.

Monthly release cadence is about the max we have capacity to manage right now given how manual the cherry picking and merge conflict resolution is (based on my understanding). Just a heads up on that front for expectations.

Caelan Co-founder, CEO @ Pipekit.io ( https://pipekit.io/ ) LinkedIn ( https://www.linkedin.com/in/caelan-urquhart/ ) | GitHub ( https://github.com/caelan-io )

On Thu, Apr 18, 2024 at 5:28 PM, Anton Gilgur < @.*** > wrote:

and can decide on the patch release schedule as they choose as well

Alternatively, we just cut a release on a rolling schedule, e.g. every 2 weeks. Anything that gets in gets in, anything else has to wait till the next at least. That could be automated. And that's potentially easier to follow / more straightforward / less confusing for users and release managers.

— Reply to this email directly, view it on GitHub ( https://github.com/argoproj/argo-workflows/issues/12592#issuecomment-2064456047 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AK5HJBVY5FHWK7ZW76I7NITY57YERAVCNFSM6AAAAABCR57GNGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRUGQ2TMMBUG4 ). You are receiving this because you were mentioned. Message ID: <argoproj/argo-workflows/issues/12592/2064456047 @ github. com>

agilgur5 commented 4 months ago

Monthly release cadence is about the max

"every 2 weeks" was if we go with the "release manager" approach I mentioned above who's just on backporting duty for a set period of time (on a rotating basis). The frequency of releases is then independent as backports do not happen at a specific "release date", but are happening constantly simultaneously as main is being developed. Releasing itself is mostly automated, backporting is not.