akuity / kargo

Application lifecycle orchestration
https://kargo.akuity.io/
Apache License 2.0
1.66k stars 141 forks source link

git-open-pr : "Another open merge request already exists for this source branch" : retrieve the ID of the MR/PR and go next step #2845

Open flbla opened 6 days ago

flbla commented 6 days ago

Checklist

Proposed Feature

When a MR is already opened retrieve it's ID and continue with next step

Motivation

Stage is failing but it should not

Brightside56 commented 6 days ago

same on 0.9.1

krancour commented 5 days ago

@hiddeco, give me a sanity check?

The legacy promo mechs in v0.9.x and earlier used to store the promotion number in the Promotion's bygone metadata map -- a precursor to promotion steps' shared state. It had to do this, because the legacy promo mechs replayed in their entirety on every reconciliation and there had to be some way of knowing the PR in question had already been opened.

For reference:

https://github.com/akuity/kargo/blob/e746b7b0e7500b79602619c0eaa6da97a7851f24/internal/controller/promotion/git.go#L218-L223

We currently have no equivalent to this check in the git-open-pr step and that is what needs to be rectified to resolve this issue.

Promotion steps only repeat themselves if we're forced to start a Promotion over from the beginning because we lost its working directory -- which would pretty much only be because the controller restarted while the Promotion was in-progress.

You can see this right here:

https://github.com/akuity/kargo/blob/c462c76edfd53821dbf7d5ec8aca24ed1fca6cf1/internal/controller/promotions/promotions.go#L463-L480

You'll notice we clear the shared state if we determine we're starting over. If we reset the starting step back to zero, but don't clear the shared state right here, the git-open-pr step would be able to self-determine that it's already done what it needs to do. (Perhaps, for good measure, we update the PR in this case in case any previous step that has now re-played yielded different results than the first time through. Such a thing could happen, for instance, with Helm charts that render anything that is random or time-based.)

I cannot think of any reason not to make this change, but as I said... I wanted a quick sanity check. lmk

hiddeco commented 5 days ago

I think there are two scenarios we need to consider:

  1. The scenario you described with Promotion replays due to a lost working directory.
  2. A second variation where a previous Promotion failed/terminated after PR creation, which can happen because:

    • Users can use static branches for PRs
    • Our branch names are tied to Project/Stage name, not specific Promotions (I'm not sure how we ended up with 2, as my initial idea was Promotion-specific branches)

I agree with not clearing the shared state initially during step zero resets. But once the process has started, I have concerns about ending up with a mixed bag of "previous" and "new" attempted state, which arguably can be misleading in certain scenarios. An alternative to address this would be to determine if any garbage collection / finalization needs to happen based on the existing state, which would allow you to e.g. close the pull request and/or delete the branch before kickstarting a new promotion.

If we decide to find existing branches / pull requests / merge requests, we must enforce the state of the branch to match the Promotion steps' output.

krancour commented 5 days ago
  • Users can use static branches for PRs
  • Our branch names are tied to Project/Stage name, not specific Promotions (I'm not sure how we ended up with 2, as my initial idea was Promotion-specific branches)

Well... that's my mistake. Precisely because state can reset, I wanted to be sure we could deterministically derive a PR's head branch name. Why I based that on Project and Stage names instead of Promotion names... I can't say where my head was at in that moment. I will happily fix this.

I have concerns about ending up with a mixed bag of "previous" and "new" attempted state, which arguably can be misleading in certain scenarios.

I did think about that, but convinced myself that because steps are ordered, as long as state returned from any step overwrites state it returned from a previous execution, then everything any subsequent step might refer to is guaranteed to be based on fresh state. So I think any confusion would be limited to human confusion, but if you were ever to see state from a step greater than the current step, it would be obvious what had happened.

If we decide to find existing branches / pull requests / merge requests, we must enforce the state of the branch to match the Promotion steps' output.

Agree. And that would mostly happen naturally. If the first run through pushed to branch x and opened PR 42, the second run through would push to branch x (if there were any diffs) and leave PR 42 alone.

Edit: The only reason to update the PR may be if the title/body had changed for any reason.

Edit 2: I think it's both easier and, more importantly, more straightforward to "reclaim" branches and PRs from a previous execution than to garbage collect.

BenjaminNeale-Heritage commented 1 day ago

In Kargo Render it was possible to either open a new PR or add to an existing PR that was already open for the Rendered Manifest Pattern branch. This is useful if the PR identifies an issue that needs to be fixed before the PR can be approved. It doesn't appear as if Kargo (v1.0.3) offers a similar option. From what I can tell, if a PR is opened by Kargo and then a new promotion is started to that stage from the same warehouse then the second promotion waits and does not start executing (refer image below).

Does this ticket refer to a scenario where the second promotion would be allowed to proceed to add the commits into the PR from the first promotion or is this ticket a completely different scenario?

image

Brightside56 commented 1 day ago

In Kargo Render it was possible to either open a new PR or add to an existing PR that was already open for the Rendered Manifest Pattern branch.

With promotion mechanisms stage with PR promotion doesn't have PR icon (at least in 0.9) and any other differences from stages without PR promotion. It can be hard to understand at all for user that PR is created and should me merged somewhere

BenjaminNeale-Heritage commented 19 hours ago

With promotion mechanisms stage with PR promotion doesn't have PR icon (at least in 0.9) and any other differences from stages without PR promotion. It can be hard to understand at all for user that PR is created and should me merged somewhere

I think that this has been fixed in Kargo v1.0.3 (refer first image below). The first stage has the git-open-pr step while the second stage doesn't have this step. When there is an open PR that has been created and the stage is waiting for the PR to be merged via the git-wait-for-pr step then the cog icon continues spinning to show that the pipeline for that stage remains active.

In my comment above, I initiated another promotion while the original promotion was waiting for the open PR to be merged which led to the second promotion being queued. I left the PR open overnight (and it remains open) but both the active and pending promotion pipelines have now failed with the following message (refer second screenshot below):

failed to run step "git-wait-for-pr": error getting PR number: no PR number found in output from step with alias "open-pr"

This may be intended behaviour of Kargo but I wanted to mention it as it continues from my comment above. It does highlight though that PRs must be processed fairly quickly after being created by Kargo.

image image