dotnet / arcade

Tools that provide common build infrastructure for multiple .NET Foundation projects.
MIT License
664 stars 340 forks source link

Display Test Known Issues for tests in non-PR pipelines #10732

Open AndyAyersMS opened 2 years ago

AndyAyersMS commented 2 years ago

We often see recurring/repeated failures in outerloop and other non-PR jobs. Would be nice if build analysis applied here too, as there is often substantial manual effort involved in triaging the failures we see in these jobs.

cc @JulieLeeMSFT

missymessa commented 2 years ago

Hi @AndyAyersMS, since Build Analysis is tied to Pull Requests, how are you envisioning the workflow would work for having Build Analysis run against non-PR jobs?

AndyAyersMS commented 2 years ago

Any failure in a non-PR job is in some sense infrastructure failure or a pre-existing failure. The main thing that would be nice here is the ability to quickly see when these failures are known failures with open issues.

missymessa commented 2 years ago

I'll follow up with you offline to get more of an idea of how you're thinking this flow would work. :)

ChadNedzlek commented 2 years ago

They already are in there. If you go to one of the rolling builds, you can click here to get to the GitHub commit for the build: image

Then in the commit, you can click here: image

ChadNedzlek commented 2 years ago

If you are just looking at the commit history for main, it's a bit easier, you can click here to see the results: image

ChadNedzlek commented 2 years ago

Granted, GitHub's discoverability is so low there as to basically be zero, but we are running analysis on the rolling builds, and you can get there with 2 or 3 clicks... you just have to know where to click.

AndyAyersMS commented 2 years ago

Thanks @ChadNedzlek

How about for something like: https://dev.azure.com/dnceng/public/_build?definitionId=1000. For the recent runs one of the failures is in the GetTotalPauseDuration test.

There is an open issue for this https://github.com/dotnet/runtime/issues/74877 -- how can I quickly discover that the failures I see here are covered by that issue.

ChadNedzlek commented 2 years ago

We don't have any analysis that spans multiple builds if not using the "known issues" feature, which that issue doesn't appear to be. Build Analysis is focused on making specific PR's actionable. Investigating current issues isn't it's primary focus, unfortunately. Though there are still some options not strictly related to the build analysis check:

You could use the analytics page of Azure DevOps to look at every failure for any given test in the last 7/14/30 days: https://dev.azure.com/dnceng/public/_test/analytics?definitionId=1000&contextType=build

Or you could use our Kusto database with some of the queries documented here: https://github.com/dotnet/arcade/blob/main/Documentation/AzureDevOps/TestReportingQueries.md

markwilkie commented 2 years ago

There's also the search test or build logs in the cloud feature.... (not sure the test search is turned on yet for /runtime though)

missymessa commented 2 years ago

Had a chat with Andy about this feature request.

Essentially, for non-PR builds, knowing if a test is failing due to a known issue is near impossible to discover from the AzDO UI. It would save hours of investigation time if we were able to provide a list of known test issues on these pipeline runs, either as another column on the test results:

image

or as another extension here:

image

ChadNedzlek commented 2 years ago

I'm not sure how we would accomplish such a thing. I think for overarching test analysis, the Kusto database is probably the best bet. It's possible we should be including some metadata about known issues in there (tests that fail due to known issues aren't able to be found/filtered out in those tables)

missymessa commented 2 years ago

Spitballing solutions: 1) Output the Build Analysis content to some kind of extension that would show up on pipeline. 2) If the test runs through Helix, Helix would look up failing tests to see if there was a known issue open to it already and link that data in the test results that would show up once it was posted on AzDO.

ChadNedzlek commented 2 years ago

We could try and add an extension into AzDO, but that's a fair bit of dev time and maintenance solution, since it can't be "GitHub flavored markdown", so we'd need an entirely separate rendering of it as pure HTML+CSS (or some other markdown engine, which will be expensive.

I don't think Azure DevOps is the right place for attempting to do this sort of build spanning analysis. The costs of us attempting to twist it into that are going to be high, and I think more flexible solutions (Kusto or the infamous "single landing page") are probably the right features to meet this scenario.

missymessa commented 2 years ago

/cc @radical

kunalspathak commented 1 year ago

since it can't be "GitHub flavored markdown"

Simple list of issues should be good enough. We do something similar using "Extensions" feature of AzDo. E.g. https://dev.azure.com/dnceng-public/public/_build/results?buildId=61917&view=ms.vss-build-web.run-extensions-tab

I think we already have data of these failures in same place from where "Build Analysis" gets its data.

markwilkie commented 1 year ago

I wonder if this isn't mostly a report. It'd only work for pipelines that point to GH, but in talking with Andy, this seems sufficient.