dotnet / dnceng

.NET Engineering Services
MIT License
25 stars 19 forks source link

Improve Dev WF by providing shared test infrastructure #1241

Open garath opened 3 years ago

garath commented 3 years ago

Summary The lack of test execution and result context combined with the inconsistency of usage across the product teams creates inefficiencies and makes it more difficult - if not impossible at times - to implement needed features as well as manage tests over time.

Some of these now long requested features include:

Automatic handling of certain scenarios. For example, some possibilities include:

In addition, there is value brought to the ecosystem overall.

The way that tests are being managed and executed is becoming increasingly more fractured, making it more challenging to manage the quality of our builds. This layer on top of Helix which is knowledgeable of the work loads is sorely needed.

Primary Business Drivers

Delivering Succes

Cost 6x Dev-Months of effort. (this is a very rough estimate, and we'll only really know once we've determined scope)

Note We should investigate cloud test again to confirm (or not) that continuing to invest in Helix is the right things to do.

Agentless Task Tasks

Task Cost (Weeks) Expected Completion Completion
Create Framework for Agentless Task execution 3
Create Agentless Task that executes helix jobs 2
Enable smart job retry for the agentless task 2
danmoseley commented 2 years ago

This is such a large one that the first step would be to do some investigation of what it would take, and what benefits we might realize. Dumps is just one example -- once we have a common means for collecting dumps, we can innovate in what we do to them, eg., an automated basic analysis.

markwilkie commented 2 years ago

My sense is that the first step would be to come up with proper architectural approach. So far, we've been "bolting" stuff onto the generic "do work" helix client, which isn't sustainable of course.

Basically, my suspicion is that the work here isn't really incremental until there's an approach/architecture hammered out.

missymessa commented 2 years ago

Added "Bubble-up test failure messages in Helix work items to AzDO" to the list of requested features. From my user study discussions, the generic "Helix work item failed" messages aren't helpful for folks investigating test failures. If there was a way to bubble up test failure messages so that it could be captured in the test run output, that would be valuable for folks investigating test failures so they don't have to dig through logs to figure it out.

agocke commented 1 year ago

To record some thoughts from Teams before I forget:

Right now Build Analysis is of limited usefulness because it only captures test success/failure at the Helix workitem granularity, which is basically the unit tests for an entire assembly.

Given that we want to apply filtering for known issues that often only affect a single test, that granularity is much too coarse.

The ask is that we need some way to track individual test success and failure. My proposal is that we standardize on the XUnit XML test output format. It can be in a well-known file name and responsibility for writing the file is entirely on the test executor (the test owner). Helix itself may not have a use for this file (I don't know), but it could pass along the information to the rest of the system, at minimum.

dougbu commented 6 months ago

We provide shared test infrastructure. Feels like the remaining items may fit better under Helix Tech Debt.