Azure / azure-sdk-tools

Tools repository leveraged by the Azure SDK team.
MIT License
110 stars 172 forks source link

[PR Workflow] Automatic ARM sign-off for TypeSpec-based PRs #7352

Open rkmanda opened 9 months ago

rkmanda commented 9 months ago

We have now reached a state with the set of linter rule automation that have been put in place where we can start enabling more self-serve scenarios and simplify the PR review process for RP owners.

The first step towards this is to automatically sign-off the PRs that are raised for open api specs that are generated from Typespec. After reviewing a bunch of PRs that were created for Typespec based PRs, we have reached a conclusion that we can automatically sign-off these PRs without requiring a manual review.

Criteria to be met for an automatic sign-off

Mechanism to indicate sign-off

Prerequisites before this automation is put in place in production

Edit by kojamroz:

Related doc in ARM wiki:

konrad-jamrozik commented 9 months ago

@rkmanda by this:

No errors reported by the linter rules

Do you mean all linters? These:

rkmanda commented 9 months ago

Sorry I meant the Swagger Lintdiff and Swagger Lintdiff (RPaaS) linter rules @konrad-jamrozik

konrad-jamrozik commented 9 months ago

Note: we will also have to update the https://aka.ms/azsdk/pr-diagram and automated PR comments appropriately.

konrad-jamrozik commented 9 months ago

Bumping to high priority, per @rkmanda. Ideally, to be done by March 2024 at the latest.

konrad-jamrozik commented 9 months ago

Relevant email thread: RE: Discuss automated sign off for Typespec based PRs

The work that needs to be prioritized to make the switch is as follows

  • Identify the labels that will be added to the PR as an indication of the automated sign-off
  • Make changes to the Automated merging requirement check OR any other appropriate trigger to add the agreed upon label(s) once the criteria is met. The criteria for sign-off is as follows
    • Indication that the open api spec was generated using Typespec
    • Step 1 of the PR is approved by breaking change reveiewers
    • No errors are flagged by the swagger lintdiff check
    • No errors are flagged by the Typespec required validation
konrad-jamrozik commented 8 months ago

Currently aiming for turning on the automatic sign-offs at the beginning of February 2024 per the call we just had.

Meeting notes and action items:

Thanks Roopesh

konrad-jamrozik commented 6 months ago

We have identified additional work that needs to happen before we can complete this work item. I aggregated it in this issue:

@rkmanda FYI

rkmanda commented 6 months ago

Criteria for automated sign-off

konrad-jamrozik commented 3 months ago

I had a chat @rkmanda. I will provide here a quick summary and elaborate on details later.

TL;DR: Work to do: need a bit extra logic to determine "is this completely new ARM RP doing TypeSpec for the first time?" If not, allow folks to skip arm review, self-attest with label, and merge.

This work item is the most urgent work, above any work related to TypeSpec conversion PRs.

Re:

PR represents incremental changes to an existing resource provider The first PR for a new resource provider will still go thru the usual manual review process.

If a PR is new for given resource provider then we shall still require a TypeSpec sign-off. The converse is an incremental change for given resource provider.

To figure out which service provider is new, we will piggy-back on the implementation in openapi-alps that determines the label new-rp-provider. Going forward the idea is to require all new service specs dirs of stable and preview to be added under specification/teamName/resource-manager/RPNS/serviceName/ as opposed to the current model of specification/teamName/resource-manager/. See specification/confidentialledger for the old model and specification/containerservice for the new model. ARM is OK with this.

The tooling still need to handle existing cases. If the tooling detects a new service being added to existing RPNS that already has other services, the tooling should still consider this a new service, even though it is in existing RPNS. To do this, the tooling should look at what is inside RPNS folder. If it does not see stable or preview, then it knows it will be a new service in existing RPNS and hence requires a review.

Such validation will happen within public main, private main, and private RPaaSMaster, and it will not compare to any other branches. I.e. each of these 3 branches needs to look only at its contents to do the validation.

Re:

  • Swagger lintdiff check passes for the PR This is to ensure that all mandatory guidelines that we as the control plane platform care about are being adhered to.
  • No swagger lintdiff suppressions are applied to the PR If any suppressions are applied to these PRs, they will go thru a manual approval process because applying suppressions indicates that some of the mandatory guidelines are attempted to be violated.
  • Authors self-attest the adherence to design best practices that are not automated

If the tooling determines it is an incremental PR and there are no LintDiff issues nor LintDiff suppressions, then ARM review is effectively skipped. The author however will need to add some label that confirms they self-attested they followed the guidelines, only then PR will be unblocked. This is a bit similar to my work item on data-plane merges to public main where we will demand a label the PR author acknowledges they are releasing the spec to public customers.

We are going to stay with some form of "suppression review required" and "suppression review approved" as it proves to be difficult to diffuse the knowledge to ARM reviewers they should look at the suppression changes before they do ARM sign-off. These labels are forcing function that cannot be achieved differently, not even with appropriate line in next steps to merge comment. Related work:

We may consider adding a functionality that will remove ARMSignedOff label if the PR requires LintDiff suppression approval which was not given.

One more note on terminology:

Previously a directory like this one:

specification/containerservice/resource-manager/RPNS/aks was named "service group" but in reality it is "resource type group". So this:

Was a effectively a "set of two resource type groups".

Going forward we can fix the terminology and call such aks folder as a "service" folder under given RPNS. Then RPNS will effectively be equivalent to "service group" (service group composed of aks service and fleet service in the example above). ARM is OK with this. Update: the cleaned-up terminology was captured in:

konrad-jamrozik commented 3 months ago

The core part of this work is implementing support for this:

PR represents incremental changes to an existing resource provider The first PR for a new resource provider will still go thru the usual manual review process.

I am going to do it by implementing support for it inside the "semantic dir structure model" component which needs to be implemented itself:

and then adding appropriate GitHub check calling into this component, similar to this: