Azure / azure-sdk-tools

Tools repository leveraged by the Azure SDK team.
MIT License
114 stars 180 forks source link

[PR Workflow] Automate blocking PRs from being merged (EventHub) #6285

Closed konrad-jamrozik closed 1 year ago

konrad-jamrozik commented 1 year ago

Per email from @rkmanda, the following needs to be automated (for details and additional criteria, see email thread RE: About Spec PR review requirements from 6/2/2023, as well as RE: Brainstorm ways to automate the responsibilities of the PR assignee):

Are all checks other than the ARM specific ones already codified into the required checks today? Heres the list of the merge criteria we had documented earlier

  1. Common [between control and data plane] a. All required checks in the pipeline passed b. If BreakingChangesRequired label exists check existence of the BreakingChangesApproved label. c. If Individual language breaking change label exists, check existence of the appropriate corresponding approval labels.

  2. Control plane a. Existence of ARMSignedOff b. Existence of ArcSignedOff if its applicable

  3. Suppression review a. Not being performed by anyone today b. If SuppressionReviewRequired label exists, check existence of the Approved-Suppression label

  4. PR assignee confirms with the author when it is ok to merge a. In remote scenarios - Force merge when there are pipeline issues b. NOTE: DoNotMerge label exists but sometimes authors forget.

Possible approaches

One way of doing it is to do all these checks in the unified pipeline Summary task and make it fail if they fail. Summary task passing would mean PR is ready to be merged.

Alternatively, instead of failing the Summary task, we could introduce a new task, named like VerifyLabelsAllowMerging, that must pass. This way we will avoid bolting additional responsibilities on top of the Summary task.

Update 6/14/2023: suppressions

The SuppressionReviewRequired label should be blocking (unless appropriate "suppression approved" label is present (if applicable)). Per:

Update 7/5/2023: ShiftLeft approval

Approved-OKToMerge label needs to be present for ShiftLeft scenarios.

For full details, see this Teams discussion.

It also participates in the workflow of merge queues in private repo.
See this Teams discussion and this PR:

Update 7/18/2023: rules for data-plane

Per this comment:

Pull Requests

Follow-up work

konrad-jamrozik commented 1 year ago

@weshaggard I think this task boils down to the following subtasks:

  1. Review which of the existing CI checks should actually be required and set them as appropriate. Setting the status checks on GitHub side is explained in step 7 here and also here. The ADO doc about integrating with GH status checks refers to the same doc, as can be seen in ADO docs on that here.
    • I think the ADO pipeline tasks themselves have simple return code of success/failure, as explained here.
    • Full lists of checks running is defined in openapi-alps .azure-pipelines.yml. There might be some complexity involved in post-processing them by our bots, but I grokked the code and I think I know where is what.
    • Specifically, the pipeline-bot PipelineEventListener may be doing some relevant post-processing, but again, I think I can find my way around.
  2. Add another task to the openapi-alps .azure-pipelines.yml, to run at the end, like VerifyLabelsAllowMerging, to capture all the rules for labels on the PR. If any of them is violated, the check will fail.
weshaggard commented 1 year ago

For #1 once the checks are ready I can update the required checks in the branch protection. I don't want to make them required until we are confident, they work as expected and can be trusted.

For your VerifyLablesAllowMerging that one likely cannot be a task in the pipeline as we need to update it based on our other bots as well. I suspect this should be a github status check instead of the standard Devops check suite/run so that we can mutate it from all our bots and pipelines as the process proceeds.

For the name instead of "VerifyLabelsAllowMerging" I would suggest "All Reviews Complete" or something along those lines. We will need to have our pipelines and bots also update the text for that status check to indicate what is currently left to be reviewed, we might also want to consider adding that to our status comment.

konrad-jamrozik commented 1 year ago

@weshaggard re this:

For https://github.com/Azure/azure-sdk-tools/pull/1 once the checks are ready I can update the required checks in the branch protection. I don't want to make them required until we are confident, they work as expected and can be trusted.

I think we could capture what people are already doing. For example, I know that checks like breaking change verification, production LintDiff or Avocado, in practice must pass, so they should be required. But some other checks appear to be bypassed most of the time. So I would say we could start by making an assessment what is required and what not, giving our partners an opportunity to review it, and setting the checks based on that.

I could compile a list of all checks and reach out to our partners.

Re this:

For your VerifyLablesAllowMerging that one likely cannot be a task in the pipeline as we need to update it based on our other bots as well. I suspect this should be a github status check instead of the standard Devops check suite/run so that we can mutate it from all our bots and pipelines as the process proceeds.

I realized that we probably want for this to be the Summary task after all. This is because the Summary task processes the labels, including making any label corrections if labels have been manually tampered with. Hence if we would have a separate GitHub check of kind status, it would, in practice, have to do very similar work to the Summary task. Otherwise someone could tamper with the labels after Summary task ran, and thus bypass the labelling requirements.

konrad-jamrozik commented 1 year ago

I had a chat with @weshaggard. Here are our conclusions.

Specifically, the pipeline-bot is processing the labelling events via PipelineEventListener.processEvent. This calls renderUnifiedPipelineChecks, which ends up applying the check update via a call to context.appClient.checks.update.

Handling race conditions

Both pipeline-bot and workflow-bot process labelled/unlabelled events, and workflow-bot may add or remove labels. Hence it is possible that pipeline-bot will trigger, read the label states, and update the All Merging Requirements Fulfilled status check just before workflow-bot updates the label. As a result, the PR may end up being in inconsistent state. However, this should be transient. For example, someone may add ARMSignedOff label which will make the pipeline-bot denote the PR as ready to merge, and moments later workflow-bot may remove that label because, say, it was invalid. This should trigger unlabelling event, which should make pipeline-bot run again, and put the PR in invalid state again.

Work plan

I will work on this iteratively. The plan for first PR is to add a capability to the pipeline-bot to do the following on labelled or unlabelled event:

In second PR, or possibly also in the first PR, I will also trigger the same logic upon "check suite run completed" event. Details of doing this need to be determined.

Future work

In later iterations on this work we might consider some way of preserving the state that was used to compute the labels, to protect against manual tampering with them. For example, currently the Summary task computes the label denoting breaking changes presence based on the output of the "Breaking Change" task, which runs in the same pipeline. This information is not readily available during "on labeled" event of pipeline-bot. It would have to be pulled e.g. from given ADO build run for given PR, assuming that build would persist it e.g. as an artifact drop.

konrad-jamrozik commented 1 year ago

A note on suppressing checks

CI checks can be suppressed by adding appropriate label. For example, if the Swagger BreakingChange check is failing, adding the Approved-BreakingChange will immediately change its state from failed to neutral. You can even observe the text:

The check status is neutral due to the check was suppressed by label Approved-BreakingChange.

My cursory investigation shows that the code emitting that state change lives in pipeline-bot, in renderUnifiedPipelineChecks, here.

The production environment config file that defines which label can suppress which check appears to be config/production/Azure/azure-rest-api-specs.yml.

The function that specifically translates ADO task status to GitHub check status is checkStatusFromTask.

Related discussion

maririos commented 1 year ago

FYI @praveenkuttappan for Release Planner

konrad-jamrozik commented 1 year ago

@mikekistler I am about to deploy the checks described in this work item. What checks should I add for data plane reviews?

I see labels like:

What labelling rules I should add? Any required checks that should pass specifically for data-plane? Anything else?

mikekistler commented 1 year ago

@konrad-jamrozik I think the rules we want are described in this issue:

https://github.com/Azure/azure-sdk-tools/issues/6184

konrad-jamrozik commented 1 year ago

@rkmanda @raych1 @tadelesh @msyyc in the private repo, I see labels like these:

image

But they do not align with the public repo. I.e. the private repo has Approved-BreakingChange-Go while public repo has Approved-SdkBreakingChange-Go (note the extra Sdk after Approved-). The config file for private repo has the same labels as the config for public repo. In fact, I cannot find Approved-BreakingChange-Go in the source code at all. Can you let me know how to interpret this? Perhaps the PRs in the private repo are getting mislabeled?

tadelesh commented 1 year ago

Seems mislabeled. @raych1 may know more about the background.

konrad-jamrozik commented 1 year ago

Progress update 7/20/2023

The experimental check is deployed in non-blocking state and has most of the rules codified in it. I announced it on the ARM swagger reviews channel.

Major big remaining work left to do

Future work

benbp commented 1 year ago

@konrad-jamrozik the github checks api is a bit of a beast with lots of little edge cases. Let me know when you get to the "list required checks via api" step.

konrad-jamrozik commented 1 year ago

Closing as principally done. It is deployed to production and running. Primary follow-up work moved to: