[Breaking Change] Add a mechanism for permanent suppressions (baselining)

konrad-jamrozik commented 10 months ago

Currently, there is no way to permanently suppress breaking change violations, as can be seen at: aka.ms/azsdk/pr-suppressions

This is a major gap causing tons of rework. We want to fix it by providing a mechanism for this. One idea we have is a solution that does comparison between the tool error output and a baseline text file checked into the repository. If all the output error lines have a match in the baseline file, the tool reports success.

Preliminary design considerations

There are several design considerations to make before we proceed with implementation. Some of the aspects we identified so far:

we probably want for the baselining solution to work with no or minimal format differences across all tools listed at aka.ms/azsdk/pr-suppressions. This probably means some work on consolidating on all the tools error message format
- we could consider SARIF. Some info about it from Gazdo folks:
  
  SARIF is meant to report vulnerabilities in code, so if the errors you are finding are pointing to a source file where the issue could be fixed, then SARIF might be an option. SARIF is meant to be used as a format understood by many different systems, VSCode, microsoft security tools, etc. so if you think your errors need to be displayed in many clients and understood by deferent systems, this could also be a good option.
the baseline file should disregard line numbers, otherwise it will be too brittle
- as a result, it needs to be able to understand which part of the error message is a line number - see the bullet about common format above
if line numbers are disregarded we may run into the risk of the tool of applying suppression to too many places - hence we need to ensure it is not ambiguous
we could possibly try to match based on hash of the line and surrounding context - a pointer I got from Gazdo folks: hash
- Additional info on how Gazdo does it:
  
  we compute a fingerprint for each finding and the fingerprint has this format: {repo/scope}{toolId}|{ruleId}|{filePathHash}|{lineRollingHash} and we store it in the database. When we get new findings, we check if we have seen it before. When someone dismisses an alert, we store the dismissed fingerprint and delete it if someone reactivates an alert.
if given tool is executed locally on a dev box, we want for the suppression mechanism to still work, not only in the CI
we should consider the ergonomics of integrating with IDEs like VS Code
we should consider support for suppressing given rule across bigger swatches of directory structure. One possible solution is to use regex to match about a common subset of rule error
having strings copy-pasted to baseline file will lead to duplication and brittleness no matter what
we should consider how it plays with other suppressions mechanisms, like e.g. those built into LintDiff (stagingOnly rules and also already ignoring previous errors), or TypeSpec Validator and the tool it calls internally, or all the tools using Autorest suppressions

Related work, docs and context

https://dev.azure.com/azure-sdk/internal/_wiki/wikis/internal.wiki/966/API-spec-PR-ARM-review-workflow
https://dev.azure.com/azure-sdk/internal/_wiki/wikis/internal.wiki/1009/Breaking-Change-Detection-Rules
https://github.com/Azure/azure-sdk-tools/issues/5838
https://github.com/Azure/azure-sdk-tools/issues/6396
https://github.com/Azure/azure-sdk-tools/issues/6859
https://github.com/Azure/azure-sdk-tools/issues/6900
https://github.com/Azure/azure-sdk-tools/issues/7239
Email thread RE: [backlog discussion] Breaking Changes for Microsoft.Sql 2023-02-01-preview Version Release
Email thread RE: Breaking Change Review - MobileNetwork ARM API

@weshaggard @mikeharder @heaths FYI

heaths commented 10 months ago

I don't know much about Sarif, but of what little I do, it seems like a reasonable approach to consolidate error/warning output into a common format. That would make a lot of things easier, including posting to GitHub (probably converting to markdown tables or something, or even pinning to specific PR lines like some bots do), comparing against a suppressions file, etc.

Hashing the surrounding context is a good idea and would probably work for most cases. But what about, when JSON, using a JSONPath. We do that for autorest in many cases, for example, and it rarely changes. I also imagine most suppressions would work fine with a JSONPath e.g.,

suppressions:
  - from: foo.json
    where: $.definitions.Bar.prop_name
    code: 1234 # or maybe friendly code name e.g., UseCamelCase
    reason: need to match external schema

konrad-jamrozik commented 10 months ago

@heaths while it is by no means a rejected approach or anything, main issue with Autorest way is that so far it has been a pain to get right, causing a lot of rework. You can observe our troubleshooting section for it:

https://eng.ms/docs/products/azure-developer-experience/design/specs-pr-guides/pr-suppressions#troubleshooting-autorest-suppressions-not-working

and the most recent addition that should show up in few minutes:

https://dev.azure.com/azure-sdk/internal/_git/azure-sdk-docs-eng.ms/pullrequest/864?path=/docs/design/specs-pr-guides/pr-suppressions.md

weshaggard commented 10 months ago

I would also be careful with hashing line text as that makes things more fragile in a number of cases as well as makes it harder for people to get the suppressions correct. Instead of a hash we should try to help ensure the error message itself has enough context to suppress the correct instance.

michaelcfanning commented 9 months ago

Just jumping in here at Konrad's encouragement on some SARIF things. For sure, your output looks very SARIF friendly, you have conceptual quality concerns, locations where these standards aren't met, user-facing reports, descriptive text for the standards enforced, etc. Using SARIF will unlock the eco-system of consumers, which includes Visual Studio, VS Code, there are multiple react controls for rendering, etc., GitHub actions speak SARIF natively for its GHAS, as does Microsoft's equivalent offering.

In re: JSON paths, SARIF has a construct called a logical location (as opposed to a physical location, such as a file on disk) that can be used to express this information. There are multiple SARIF-exporting tools at MS that use this data to record JSON paths for scan tools.

In re: suppressions, I think your current approach, to produce a JSON path that's specific to a finding, is a reasonable approach that's reasonably resilient. We have had some other JSON scenarios I'm aware of where a tool will elide some path information in an attempt to make the suppression less fragile. For example, a precise index into an array (assuming that this data isn't required to uniquely identify a problem).

The SARIF partialFingerprint property is intended to support arbitrary things that might contribute to suppressions. And SARIF itself is designed to allow for a sort of suppression 'sieve' where results might be matched using things like code snippets, falling back to line locations and/or other data if a more precise match fails. This result matching occurs in more of a baselining context, where you may have a baseline of well-known SARIF, generate a new run, and then diff the two. The SARIF SDK in NuGet has a fair amount of code to help with this.

Finally, it sounds like you're already concerned about the fragility of a per-line hash, the SARIF SDK has a rolling hash algorithm built into it that's intended to help produce more stability for code churn. But this hash will be even more fragile than a single line hash, so probably not of interest. Will have the undesirable characteristic of being opaque to user, which @weshaggard warned again.

I have authored some tools historically that emitted a working suppression in the user-facing output and this could be an option for you. A typical approach would be to emit your finding to the users in a well-formed sentence. And then to add another sentence that describes the precise suppression data that can be cut-and-pasted elsewhere to suppress. According to SARIF, consumers can always choose to select your first sentence of output only (when UX real estate is limited), and provide an expander or something else to get additional content. The VS SARIF viewer uses this approach a lot. In console output, of course, the entire finding will be sent to output.

konrad-jamrozik commented 9 months ago

Note: there are additional "suppressions for SDK" proposals provided by @raych1 in the Word file titled SDK Breaking Change Review Workflow. Link here.

konrad-jamrozik commented 8 months ago

Bigger picture discussion about suppressions here:

https://github.com/Azure/azure-sdk-tools/issues/4437#issuecomment-1874580158

konrad-jamrozik commented 8 months ago

I had a chat with @weshaggard how to approach first prototype of this work.

Consider this example breaking change log

{
    "type": "Result",
    "level": "Error",
    "message": "The new version is missing a 'x-ms-enum' found in the old version.",
    "code": "RemovedXmsEnum",
    "id": "1049",
    "docUrl": "[https://github.com/Azure/openapi-diff/tree/master/docs/rules/1049.md",](https://github.com/Azure/openapi-diff/tree/master/docs/rules/1049.md%22,)
    "time": "2024-01-01T11:48:40.019Z",
    "groupName": "stable",
    "extra": {
        "mode": "Removal"
    },
    "paths": [
        {
            "tag": "Old",
            "path": "[https://github.com/Azure/azure-rest-api-specs/blob/main/specification/alertsmanagement/resource-manager/Microsoft.AlertsManagement/stable/2019-03-01/AlertsManagement.json#L34:9"](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/alertsmanagement/resource-manager/Microsoft.AlertsManagement/stable/2019-03-01/AlertsManagement.json#L34:9%22)
        }
    ]
},

mentioning this spec element.

  "paths": {
    "/providers/Microsoft.AlertsManagement/operations": {
      "get": {
        "operationId": "Operations_List",
        "description": "List all operations available through Azure Alerts Management Resource Provider.",
        "parameters": [
          {
            "$ref": "#/parameters/api-version"
          }
        ],

For full context of this example data, see this comment.

For the first prototype, I will do the following:

Emit to the ADO build log of the breaking changes tasks a new property for each message in the ---- Full list of messages ----, that will contain the suppression line ready to be copy-pasted to the suppression file. For the provided example it could be e.g.:

"snippetToApplySuppression" : "$.paths.['/providers/Microsoft.AlertsManagement/operations'].get.parameters"

The value will be a JSON path derived from the available information. Probably we could pull it out from the openapi-diff somehow. You can verify the JSON path at https://jsonpath.com/

The ADO build log will also point to aka.ms link explaining how to apply these snippets, as described below.

Add support for taking this snippetToApplySuppresion and pasting it to the newly added suppressions file. That file will live besides the service README.md. In the provided example it means it would live in specification/alertsmanagement/resource-manager. It could be called suppressions.md. It will be a .yml file and will have appropriate comments in it explaining its purpose.
Add support for reading suppressions.md in the breaking change ADO tasks and excluding any new breaking change violations from triggering if it matches the JSON path available in suppressions.md. Appropriate build log message should be added denoting a breaking change alert was triggered, but it was suppressed.

Design considerations

The AutoRest README.md also has suppressions. They are interpreted by AutoRest and used to suppress LintDiff issues. As a result, we will have suppression logic in multiple places.
We could make adding the suppressions as simple as "append this line to suppressions.yml" but this means we won't have any structure in that file, maybe only comments. The more structure we add and metadata (e.g. key suppressions:) the more risk users will have issues with adding suppressions.
A more fancy option of adding suppressions would be to instead provide the users with instructions which comment to add. Our automation would read that comment and push an update to the PR adding relevant suppression, or make a PR to the user PR with the suppression.
We also need to consider scenarios where users want for the suppressions to be more general than for the one specific element pointed by the JSON path. This is extra relevant because what often happens is that an OpenAPI spec element that has the issues is referenced in many places, and each such reference triggers the breaking change violation. I think we currently may run into an issue of kind: a) the tooling reports the problem on the first reference b) user suppressed the issue c) now the tooling reports issue on the second reference d) user suppressed that e) now the report is on the third reference, and so on. This may happen (not sure yet) because the tooling has deduplication logic. See #7242.
There is at least one more suppressions mechanism, for suppressing issues in the generated SDK languages like C#. See this comment.
I should also investigate using SARIF, see this comment.

weshaggard commented 8 months ago

Let's not abuse md files for these suppressions let's make it a proper yml file so suppressions.yml. As for a suppression entry you will likely also need the file path and the error code type along with the json expression. I do think it is OK if they have to copy multiple lines as long as it is as simple as copy/paste into a file.

konrad-jamrozik commented 7 months ago

Per our chat with @mikeharder @weshaggard: Example of the currently proposed format in the suppressions.yml

- tool: Breaking Change
  filePattern: ./specs/arm/KeyVault/**/
  jsonPath: ...
  code: ParamRemoval
  reason: matching actual impl.
- tool: TypeSpec requirement
  filePattern: ./specs/arm/KeyVault/**/
  jsonPath: ""
  code: ""
  reason: brownfield

The suppressions file to read is determined as follows: for given API spec being evaluated, walk the directory tree until you find the first suppressions.yml file. Ignore others. No merging logic across multiple files. No configuration cascade.

heaths commented 7 months ago

Just a thought: should we consider JMESPath instead? This is what az uses so skills may be more transferrable. Or do we also make use of JSONPath in other places in our ecosystem? They are both close and I don't particular care either way. I was just pondering that service teams using az might be more familiar with JMESPath. There are libraries for using either in code.

konrad-jamrozik commented 7 months ago

Just a thought: should we consider JMESPath instead? This is what az uses so skills may be more transferrable. Or do we also make use of JSONPath in other places in our ecosystem? They are both close and I don't particular care either way. I was just pondering that service teams using az might be more familiar with JMESPath. There are libraries for using either in code.

@heaths We use JSONPath for AutoRest suppressions, which power few of our checks, and this is the primary way how folks have been suppressing things so far:

https://eng.ms/docs/products/azure-developer-experience/design/specs-pr-guides/pr-suppressions#verify-your-where-path-with-jsonpath-and-ensure-proper-escaping

Still, we could consider supporting JMESPath in addition to JSONPath.

heaths commented 7 months ago

Still, we could consider supporting JMESPath in addition to JSONPath.

Eh, I wouldn't bother. :smile: Thanks for noting you already use JSONPath. Certainly in this case that is more consistent. The two are close, and I think supporting both would just lead to confusion. I didn't realize you already used JSONPath, so I only suggested considering it since az did. Never mind. :smile:

konrad-jamrozik commented 4 months ago

This work will build on:

https://github.com/Azure/azure-rest-api-specs/issues/27348

Azure / azure-sdk-tools